|
![]() |
|
Article from May, 1999. RDF: Adding Structure to the WebBy Bob DuCharme Bob DuCharme is a senior software engineer at Moody's Investor's Service. He is the author of the brand-new XML: The Annotated Specification, and SGMLCD, a tutorial and user's guide to free SGML software. Both books are part of Prentice Hall's Charles F. Goldfarb Series on Open Information Management. Abstract
The W3C's Resource Description Format (RDF) is a content labelling schema that permits authentication of page resource properties on the web. Contributing author Bob DuCharme offers readers a redux of the basic RDF standard and illustrates the potential RDF has to enhance client-side visibility of web pages in the future. An attorney friend of mine once complained that Web search engines weren't as good as those used by services such as Lexis and Westlaw that are aimed specifically at the legal profession. I explained to him that the engines themselves were only a small part of the picture-that the legal research data itself was scientifically structured and organized by highly-paid professionals to make the search engines' job much easier. The structure of Web data, I continued, is so vague that Web search engines have little if any way to know what role each piece of information plays, which is crucial to easily finding the information you need. The budding W3C standard RDF (Resource Description Framework) aims to change that. While some see it as one more XML offshoot with big promises, difficult syntax, and few implementations, the scale of the promises is getting impressive: a recent article on CNN's Web site described how Tim Berners-Lee, W3C director and inventor of the World Wide Web, was publicly pushing RDF as the way to let people get serious with searching and e-commerce on the Web. A W3C press release quoted him as saying " RDF provides the necessary foundation and infrastructure to support the description and management of [Web] data. RDF can transform the Web into a more useful and powerful information resource." ( http://www.w3.org/Press/1999/RDF-REC ) RDF began as an extension of the W3CPICS (Platform for Internet Content Selection) standard, a system for labeling Internet content that gained fame for its promised ability to help hide pornography from children. The geeks involved, however, saw wider uses for this system that described content so that automated processes could take action based on those descriptions. Once XML became the obvious way to encode this information, RDF grew into something with enough promise that Tim Berners-Lee was soon telling CNN that it was the future of the Web. Because RDF is data about data, it's "metadata." Because it's implemented using XML, which already uses metadata (the DTD) to describe a document type's structure, there is some room for confusion, so let's compare what each actually describes. An RDF "resource" can be a Web page, a subset of a Web page, or a collection of Web pages. While a DTD describes the pieces of"information comprising a document type (for example, the ingredientList, prepSteps, and cookingTime element types of a recipe document type), resource descriptions (the " RD" in " RDF") describe the resource itself. For example, a resource description might identify the author(s), the last date" updated, the URI, the language it was written in, and the URIs of related resources. The W3C's "Resource Description Framework Schema Specification" document at http://web1/w3/org/TR/PR-rdf-schema includes a nice example in its actual markup. (Being in the markup, you have to select "Source" from your Netscape Navigator or Microsoft Internet Explorer "View" menu to see it.) The RDF element near the document's top has subelements such as Title, Description, Publisher, Date, and several others. The RDF Schema Specification, which became a Proposed Recommendation on March 3rd and may be a Recommendation by the time you read this, "does not specify a vocabulary of descriptive elements such as 'author'. Instead, it specifies the mechanisms needed to define such elements, to define the classes of resources they may be used with, to restrict possible combinations of classes and relationships, and to detect violations of those restrictions." That is, it describes how you specify a vocabulary of descriptive elements such as "author." A given set of such element types and details about them is known as an RDF schema. Just as a library with many different kinds of books, CDs, and videos needs a clearly-defined structure for card catalog entries to keep track of all that media, a collection of material spread across the Web may conform to a variety of DTDs, but RDF lets you treat it as an organized, unified collection. The other important W3CRDF document is the "Resource Description Framework Model and Syntax Specification" ( http://web1.w3.org/TR/REC-rdf-syntax ), which became a Recommendation in February. It lists the kinds of structures that RDF can describe, taking care to note that the " XML syntax [used to represent RDF] is only one possible syntax for RDF and that alternate ways to represent the same RDF data model may emerge." In other words, the "Framework Model" it covers is more important that its "Syntax Specification." The "Basic RDF Model" section near the beginning of this W3C document tells us that RDF stores statements that each consist of a resource with a named property and a value for that property. I mentioned earlier that a resource is typically a Web page, if not a subset or superset of one. A property's relationship to a resource resembles an attribute's relationship to an XML or SGML element: it describes it. The XML used to represent such a statement may be as simple as this example that it gives: <rdf:RDF> <rdf:Description about="http://www.w3.org/Home/Lassila"> <s:Creator>Ora Lassila </s:Creator> </rdf:Description> </rdf:RDF> This essentially says "Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila." The rdf: prefix for each element in the example tells us that, for example, rdf:Creator is an element of the Creator element type defined in the "rdf" namespace. A namespace is a specific collection of element and attribute types; a namespace declaration earlier in the document would identify "rdf" as the shorthand used to refer to the element types declared at a particular URI. The following slightly more complex example from the "Resource Description Framework Model and Syntax Specification" tells us "The individual referred to by employee id 85740 is named Ora Lassila and has the email address lassila@w3.org . The resource http://www.w3.org/Home/Lassila was created by this individual." <rdf:RDF> <rdf:Description about="http://www.w3.org/Home/Lassila"> <s:Creator rdf:resource="http://www.w3.org/staffId/85740"> </rdf:Description> <rdf:Description about"<http://www.w3.org/staffId/85740"> <v:Name>Ora Lassila</v:Name> <v:Email>lassila@w3.org</v:Email> </rdf:Description> </rdf:RDF> The two W3CRDF documents describe the possibility of far more complex structures to add to your Web documents. This complexity has intimidated some people; after all, while design goal 6 of the XML spec mentions the importance of XML documents being "human-legible and reasonably clear," RDF literature often stresses the importance of "machine-understandable" coding. While these two goals don't always conflict, fulfilling them simultaneously is not trivial. The applications made possible by RDF's greater machine-friendliness have a lot of potential. For example, the Document Content Description schema proposal submitted to the W3C as an alternative to traditional DTD markup syntax calls itself an RDF "vocabulary." The Mozilla/Netscape browser currently under construction uses RDF to implement powerful new features in its bookmarking system. The ability to digitally certify documents using RDF will remove an important impediment to e-commerce. If you are not implementing one of these fancy new applications, why add all this markup to your Web pages? To achieve a key goal of publishing Web pages: to make it easier for people who do not have your specific URI to find your pages. Various metadata tricks like the HTMLmeta element type have been around nearly as long as search engines to encourage hits on certain Web pages. A solid W3C standard that allows both simple and complex expression of such metadata will provide many benefits to both amateur private web developers and commercial developers looking to invent new businesses. For a good collection of information about RDF, see the RDF section"of Robin Cover's Oasis SGML/ XML Web page at http://www.oasis-open.org/cover/rdf.html <end/> |


