Just announced yesterday was a new site to help enrich HTML with semantic information by Google: http://googleblog.blogspot.com/2011/06/introducing-schemaorg-search-engines.html . The idea is to create a method by which a web page can be embedded with clues to what the words that appear actually are and then can be automatically recognized by software. A simple example is a postal address, which when marked up with this new method would make it simple for a web browser to allow a person to add it to an address book with a click of a button.
The system is related to Microformats and RDFa, fairly simple to implement in HTML and fairly flexible.
Now for the problems:
First the site with the information, http://schema.org/ , is itself broken HTML according to tests using the http://validator.w3.org/ . The site claims in its "doctype" to be <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">. This is a major problem if this site is to be used by software systems to help implement the standards it claims to be proposing. Which leads to the second more significant problem. (Please fix the HTML)
Secondly, the standards for things (note: "things" is the technical phrase used by the standard) point to the site as the authoritative site for the various standards. Take the postal address standard. The schema.org site's documentation says to use http://schema.org/PostalAddress as the URL for the postal address. While this appears to be helpful, the standard being proposed is flexible enough to allow URL's controlled by other organizations to be used. And in the case of postal addresses, this should be the postal service or the national effort to have a standard address format that the postal service is now participating:
- (http://pe.usps.com/text/pub28/welcome.htm or
- http://www.fgdc.gov/standards/projects/FGDC-standards-projects/street-address
Fixing the HTML should be easy. Fixing the tendency for centralization of standards away from the authoritative sources is harder. Usually, it is caused by frustration with trying to get the authoritative organization to participate and contribute. But the short term benefits may be overwhelmed by fracturing since schema.org may have to compete with other non-authoritative organizations for standards for postal addresses. Also the authoritative organization may change their standards which may be difficult for schema.org and others to adequately meet. Also, the authoritative organization would be able to provide validation of the data in ways that schema.org could never match. For example, the postal system will know all of the road types allowed, and even what street numbers exist for a street. And each country will likely have a better schema than schema.org.
And last of all, there is another method to reach most of the objectives of this "microdata" objective that can be used in some cases which is much lighter weight. That is, when possible, to just make a link. When mentioning a state, one could spend time coding the schema.org code into the HTML, or one could just link the state to a URL for that state (e.g. Virginia or Virginia). Combining the lightweight link standard with the heavier and harder to implement one may be the best of both worlds for some publishers. Take recipies: done right, a recipe might link to FDA or Agriculture nutrition data, allowing for auto calorie calculations.
Schema.org is a great step forward for the web. Just please fix your HTML and don't pretend to be the standard bearer for everything under the sun.
Comments