|
![]() |
|
Article from March, 1999. If Not DTDs, Then What?By Bob DuCharme Bob DuCharme is a senior software engineer at Moody's Investor's Service. He is the author of the brand new XML: The Annotated Specification , as well as SGMLCD , a tutorial and user's guide to free SGML software. Both books are part of Prentice Hall's Charles F. Goldfarb Series on Open Information Management. Abstract
On an XML discussion mailing list, someone once claimed that no one would use DTDs if they were optional. Why bother, he asked, with something that just restricts your freedom when creating documents? XML Specification coeditor Tim Bray replied that the opposite effect had happened: people complained that DTDs did not allow enough restrictions. This article corrects most peoples' misconceptions that the current work on schemas in the W3C don't constitute alternatives to the DTD, but different ways of representing the DTD. On an XML discussion mailing list, someone once claimed that no one would use DTDs if they were optional. Why bother, he asked, with something that just restricts your freedom when creating documents? XML Specification coeditor Tim Bray replied that the opposite effect had happened: people complained that DTDs did not allow enough restrictions. XML processors that automate the checking of further constraints make system development easier because developers have less error-checking code to write. Database management people in particular, as they add XML to their repertoire of tools, take certain schema constraint features for granted. As XML's role in these systems grows (Oracle and Sybase are already trumpeting XML support) the need for stronger datatyping is becoming clearer. Other valuable innovations of databases and recent programming languages also have advantages to offer XML, such as inheritance of declared classes (or in our case, element types) and more structured documentation of schema. Another complaint about DTDs, most commonly from users with no SGML background, is "why all this separate syntax to define a document type's structure? If XML stores structured information so well, why not use it to store a document type's structure?" Several proposals to the World Wide Web Consortium ( W3C) have offered alternatives to traditional XMLDTD syntax in order to address these complaints. (In the SGML world, the "Web SGML Adaptations" to the SGML specification also provides for alternative DTD syntaxes.) In this issue, we'll look at the categories of new features in these proposals; next issue, we'll look at four specific proposals: XML-Data, DCD, SOX, and DDML. First, a picky point about what " DTD," or "Document Type Definition," really means: in popular use, people refer to the collection of declarations that express a document type definition as the DTD itself, but this is not strictly correct. According to The XML Handbook by Charles Goldfarb and Paul Prescod, "A DTD is a concept; markup declarations are the means of expressing it." The declaration syntax used by XML and SGML to declare element types, attribute lists, entities, and notations provides one way to describe a document type's definition, and the XML-Data, DCD, SOX, and DDML proposals offer four more ways to express a DTD. Strictly speaking, these newcomers are not competitors to DTDs, but new ways to express a DTD that provide alternatives to the traditional notation. The DTD as an XML ElementOmitting a traditional DTD is not a huge problem with an XML document, because if it is still well-formed, it is still a legal XML document. There were two main reasons for storing XMLDTDs as a slight variation on SGMLDTDs:
Storing schema information in an XML document has three key advantages:
Data TypingWhile SGML allowed you to specify certain constraints on attributes' data types when declaring them, XML's simplification of SGML's syntax threw most of these out. The complete inability to constrain character data in element content is one of the most important considerations when deciding whether to store a piece of an element's information as an attribute or as a subelement. This inability led to wishes for greater power when describing the data allowable for a given element type. For example, quantity elements would require much less error checking code down the line if you could specify right in the DTD that this element type's content must be a positive non-null integer--for example, that <quantity>-4</quantity>, <quantity>3.2</quantity>, <quantity>hello world</quantity> , and even <quantity></quantity> are all illegal values. (Some proposals go further by allowing the schema to specify a range of values--for example, that a given element's content must be an integer between 3 and 9.) In addition to integer values, what other data type choices should XML offer? Those offered by SQL and popular programming languages have given some schema proposals a good starting point: string, boolean, date, time, real, and others. InheritanceSoftware engineers who use object-oriented programming languages are accustomed to designing complex, interrelated data structures by basing new ones on existing ones, thereby "inheriting" the structure (among other things) of the existing ones. Most schema proposals either describe a way to do this with element type declarations or they at least acknowledge the usefulness of being able to do so. This would let you declare one element type by saying "this one is just like this other one, but with the following differences." Structured DocumentationA nice feature of the Java programming language is its ability to store comments about declarations in a format more structured than regular program comments. This allows an automated process to identify the purpose of each comment and its exact relation to the code it describes. A traditional XMLDTD should have descriptive comments inside of the <!-- and --> delimiters, but there is no standard that lets an automated program identify which comments refer to which DTD declarations. Most schema proposals address this by defining element types whose elements are included within element type definitions to describe the purpose of the defined element types. (I told you that storing a document's meta-information with its content could be confusing.) This makes it easier to crank out usable documentation for the schema. Global AttributesMany of the new schema proposals provide some way to define attributes that can be used for all the element types in a document--for example, a uid or revDate attribute. Traditional SGML and XMLDTDs let you implement something similar using parameter entities, but the necessity of all those parameter entity references make this method harder to maintain than an attribute list simply identified as global. Separation of Logical from Physical SchemaTraditional DTDs specify logical structure (for example, which elements can go in an element of a particular type) using element type and attribute declarations. They also specify physical structure (for example, the actual filename and directory where pieces of a DTD or document may be stored) using entity declarations. This blend of logical and physical schema definition is not the cleanest way to specify a document type's structures, and most of the schema proposals point out the messiness of mixing them up. Unfortunately, most of them, instead of offering an alternative, merely ignore the need to specify the physical structure of a document type. At best, they offer some equivalent feature in order to allow round-trip conversion between traditional DTD syntax and the schema being proposed. The FutureNone of the four proposals will ever "win" as the accepted alternative to traditional DTD syntax. Instead, the W3C has assembled an XML Schema Working Group to evaluate the proposals and then construct a new proposal combining their best features, and probably adding some new ones as well. The Working Group's membership includes at least two authors, editors, or contributors involved in the creation of each of the original four proposals. In the next issue, we'll look more closely at what they have to work with. <end/> |


