|
![]() |
|
Article from June, 2001. XSL Tutorial, Part 1: BackgroundBy Brian E. Travis Brian Travis is founder and Chief Technical Officer of Architag International Corporation and Managing Editor of <TAG>. Abstract
This is the first of a multiple-part series on XSLT, a very important W3C standard that provides a way of transforming XML documents from one structure to another. XSLT can be used to create HTML, so your XML documents can be viewed in a Web browser, or XSLT can be used to transform your XML documents to any other XML structure, or even non-XML structures. In this first part, you will learn about the background of XSL and why it was broken up into two standards, XSL and XSLT. In following months, I will go over the syntax of XSLT and how to use it in your applications. BackgroundIn November, 1999, the W3C formally adopted XSLT, the Extensible Stylesheet Language Tranformations. XSLT was extracted from a W3C effort called XSL. While XSL is still in committee, the XSLT spec has been implemented by several companies, and stands to be one of the most important standards since XML itself. In 1986, the International Organization for Standardization (ISO) adopted SGML. Like XML, SGML allows implementers to express the structure—but not the formatting—of their information. Soon after the adoption of SGML, some members of the SGML committee formed a new committee to work on a standard language for expressing the formatting characteristics of an SGML document. This language was called the Document Style Semantics and Specification Language (DSSSL). DSSSL took 10 years to complete because of the difficulty of designing a single syntax to express all formatting information for all documents and all devices. The DSSSL specification was finally adopted as an ISO standard in 1996. Unfortunately for DSSSL, that was about the time XML was being developed, and potential DSSSL implementers took a wait-and-see attitude toward the DSSSL specification. To this day, I don't know of a single serious implementation of DSSSL. In 1998, some members from the DSSSL committee formed a working group under the auspices of the W3C to create a standard for rendering XML documents (the way DSSSL was supposed to create a standard for SGML documents). Because this group had already worked for 10 years to understand how documents could be rendered, members got off to a quick start and ultimately developed XSL. XSL OperationThe W3C XSL presentation process has two major parts: formatting objects and transforming XML documents. To format a document, an XSL processor reads an XML document and applies a set of transformation rules to create another XML document, called a result tree . This result tree adheres to the formatting object namespace, which contains hundreds of elements and attributes that describe the presentation of the XML document. For example, the result tree indicates whether a particular textual object will be bold or italic, red or salmon, inline or blocking. The result tree does not have any instructions for a particular typesetting language. Instructions are applied in the next step of the XSL process: formatting object interpretation. The result tree is read into a formatting object interpreter, which interprets the formatting object elements and attributes and outputs typesetting codes for a particular typesetter. Figure 1 illustrates this process.
Figure 1: The two parts of the W3C XSL presentation process: First the input XML document is transformed into a result tree, and then the result tree is interpreted by a formatting object interpreter optimized for a particular output.
In this example, if a designer wants to display a particular piece of text in green italics, all she needs to do is indicate those requirements in generic terms. The font style and color attributes are in the schema referenced by the formatting object namespace. These attributes are set to italic and green. This declarative way of indicating output transcends any particular output medium, which means that the designer needn't worry about particular typesetting codes. Let's use HTML to show how the XSL formatting works. In HTML, inline CSS styles indicate font style and color. The formatting object interpreter for HTML renders the green italic object as STYLE="font-style:italic;color:green" . This string is readable by an HTML typesetter (a Web browser). The final paragraph tag looks like this: <P STYLE="font-style:italic;color:green;">May 5, 2000</P> Suppose you want a paper (rather than an HTML) document. To render our document on paper, we could use a formatting object interpreter that understands the rich text format, or RTF. (Microsoft created RTF syntax in the mid-1980s as a 7-bit ASCII representation of richly formatted word-processing documents. Because RTF was plain text, it was easy to transmit over e-mail and other early transport protocols.) Our example's formatting object interpreter transforms the font-style="italic" command into \i , which turns the text that follows the command into italics. The color="green" command is transformed into \c6 , indicating that the color is found in the sixth entry of the color table at the top of the RTF document. The resulting RTF document fragment might look something like this: {\c6\i May 5, 2000\par} The XSL presentation process provides a powerful model because it allows an organization to get any number of outputs from the same XML inputs and style sheets. Of course, this model is advantageous only if you have support for a formatting object interpreter for the types of outputs you are considering. The problem with this approach is that it is very very difficult to pull off. The DSSSL specification took ten years to develop, and there still is not a single commercial implementation of DSSSL. One might suspect that XSL will have the same problems. Microsoft's Implementation of XSLAs of this writing, the XSL specification is still under development. The formatting object libraries are complex, and many outstanding issues still need resolution. In 1998, Microsoft felt that the transformation piece of XSL was stable enough and implemented only that part of XSL from a working draft of the XSL specification. Microsoft introduced this part of XSL in the MSXML parser, which shipped with Microsoft Internet Explorer 5. This gave developers access to a mechanism that allowed general-purpose, XML-to-XML transformation language. Although some people criticized the Microsoft XSL implementation as an incomplete part of a W3C specification, many other people used the implementation to understand the power of a declarative transformation language. As a result of this wide understanding, the W3C XSL Working Group extracted the transformation part from XSL and created a new W3C Recommendation, XSL Transformations (XSLT) Version 1.0. XSLT is illstrated in Figure 2.
Figure 2: The XSLT model contains only the transformation part. An XSLT stylesheet directly contains information about the output device.
While XSL now points to XSLT as its transformation engine, XSL still contains all the formatting object support. XSLT requires a syntax that enables the selection of certain parts of an XML document. For example, to render a chapter title one way and a section title another way, two rules must be able to specify a path to the appropriate objects in a particular context. Because XSLT is not the only W3C standard that requires this syntax, the XML Path Language (XPath)—was extracted from the XSLT specification. The W3C adopted both XPath and XSLT as Recommendations in November 1999. Next MonthIn next month's installment, I will explore the XSLT programming model, and how you can create a rules-based, event-driven transformation program. |


