XSD and Namespaces
By Brian Travis and Mae Ozkan
Brian Travis is founder and Chief Technical Officer of
Architag International Corporation and Managing Editor
of <TAG>
Mae Ozkan is Chief Architect of Architag International
Corporation.
Their most recent book,
Web Services Implementation
Guide
, offers a practical and technical explanation
of what web services are, and how to make them work. This article
is adapted from the book.
Abstract
An XML document, by itself, is not really good enough. Anyone can create an
XML document that is perfectly conformant with W3C XML specification, but
completely unreadable by anyone else.
That is because the W3C XML specification only defines the syntax for creating
a markup language, but does not specify the tags that makeup the markup language.
It is the
"Schema
"
that specifies the markup language and
enforces it.
This article shows how an XSD document specifies a markup language and
the meaning and role of namespaces.
XML Schema Definition (XSD)
In September of 2001 the W3C Schema Working
Group finished their work on a new schema syntax designed to
be an alternative to the DTD. In 2001, it was approved
by the W3C members. This syntax is called XML Schema Definition,
or XSD. Microsoft has implemented XSD in version 4 of their
MSXML tools suite. Most other XML parser manufacturers have
also implemented XSD, so it is now the preferred schema syntax
for interoperability.
An XSD schema is much more complex than the
DTD or XDR, but it also has a simple personality that makes
it easy to get started. Do not let that simplicity fool you,
however. XSD can also be used to describe very complex data
structures with all of the modern object-oriented features you
can think of.
A simple XSD schema that describes our weather
forecast is shown in
An XSD Schema.
<?xml version="1.0"?>
<schema id="weather" elementFormDefault="qualified"
targetNamespace="Weather Markup Language"
xmlns="http://www.w3.org/2001/XMLSchema ">
<element name="Weather">
<complexType>
<sequence>
<element name="city" type="string"/>
<element name="temperature" type="integer"/>
<element name="wind" type="integer"/>
<element name="forecast">
<complexType>
<sequence>
<element name="day" minOccurs="3" maxOccurs="10">
<complexType>
<sequence>
<element name="temperature" type="integer"/>
<element name="wind" type="integer"/>
</sequence>
<attribute name="date" type="date"/>
</complexType>
</element>
</sequence>
</complexType>
</element>
</sequence>
<attribute name="date" type="date"/>
</complexType>
</element>
</schema>
An XSD Schema
The XML Schema Definition syntax is accepted
by all members of the World Wide Web Consortium.
Namespaces
Before we go any further, we need to talk
a little bit about namespaces. This XSD schema is a formal specification
for a markup language, as we talked about earlier. That markup
language needs a name, so we give it one on line 003. There, we
indicate
targetNamespace
and assign it the value
Weather Markup Language
.
On our target document, then, we need to put
an XML namespace declaration to indicate that that document
is to use the
Weather Markup Language
. We will do that by
creating an XML Namespace declaration:
xmlns="Weather Markup Language"
XML Namespaces probably holds the human record
for the smallest document ever created by a committee. The document
is 11 pages long, but only seven pages contain the meat of the
standard. You can get the specification at
the W3C Web site, but do not expect the specification
to mean much. It is written in
"standards speak
"
, and really needs to be interpreted
by a human to be useful. We'll be your humans today.
You will see lots of namespace declarations
as you look at XML documents. Many, if not most, of them look
like URLs. That is, they have the form
"http://...something...
"
. This is misleading,
as you would expect that you could type the string into your
Web browser and go somewhere useful. This is a natural tendency,
as we were all genetically programmed to do such a thing. Type
in a URL and the faithful server responds with something you
can use. You want to get a schema, or a software specification,
or a listing of the tags, or even just a phone number of
the person who can help us with more information.
However, you will be lucky to get anything
at that
"address
"
. That's because the XML namespace declaration
is not a URL. It is a URI.
"URI
"
stands for
"Uniform Resource Identifier
"
. If an XML document
has a namespace declaration, then the string that follows is sent to the
application as a namespace. The string means nothing to the
parser. That's right, it's just a string to the parser. The
parser does not go out to any resource to check and see if there
is a schema there. Since it is a string, its only requirement
is that it be well-formed.
When that string is sent to the application,
it is up to the application to do something useful with the
string. You can instruct your application to ignore the string,
or you can have the application look up the string in some kind
of keyed environment or registry to try to resolve it to something
that the application can use for validation. Or, if the
namespace URI looks like a URL, you could have your application
go to that resource and see if there is anything interesting
there.
Let us illustrate by using an example in the
human realm. One of us is named
"Brian
"
. That is a string that his mom designed
as a way of identifying him. However, there is nothing in that
string that tells you where he is in the world right now. There
is nothing in that string that tells you what kind of beer he
likes. In order to get such information, you need to find someone
who knows Brian. You might go to a friend of his and say,
"Do you know Brian?
"
. At first, they will deny
any knowledge of him, but if you keep pressing, they will admit
that they do know him.
Then you can ask them information about him.
If you ask what kind of beer he likes, they will laugh, then say
that he prefers the
"A-N-Y
"
brand of beer.
"Any
"
beer will do.
XML namespaces are like that. In and of themselves,
they do not really have anything to do with a collection of
elements and attributes, except that they identify the collection
with a name. You need to go to someone who knows how to resolve
the namespace to find information about the members.
Having said all of that, you might be comforted
to know that there are conventions that people are doing to
help you get information on a document that is identified with
a namespace. Quite often, if a namespace looks like a URL, there
will be something at that endpoint that will give you,
the human, some guidance in how to interpret the namespace.
The important thing to understand, however, is that this is
not official according to the XML namespaces specification,
and just as often, you might just get a 404 error.
So, an XML namespace is just a string to the
parser. It does nothing to resolve the namespace into anything
meaningful. The parser passes the string to the application.
It is the job of the application to do something useful with
that string.
You can have as many namespace declarations
on an element as you want, but only one of them can be the
default namespace. The rest must declare a unique namespace
prefix. A default namespace is created in the form that we have
seen here:
xmlns="this is a namespace URI"
.
Any element or attribute that is within the
scope of the element in which the default namespace is declared
is said to be a member of that namespace, unless it is otherwise
overridden.
If you want to override the default namespace,
there are two ways. You can define another default namespace
at any element that is a descendant of the element. In this
case, that default namespace will be active until it's element
is ended, at which time the higher namespace will take over.
Let us illustrate in
An XSD Schema. There is a default namespace declaration
on line 002. All elements and attributes within the scope of the
patient
element are members of the
"patient ns
"
namespace. So
patient
and
name
are
members, but notice that there is another default namespace
declaration on line 004, in the
age
element. That namespace
overrides the
patient ns
namespace for the scope of the
age
element. So
age
,
base
, and
years
are members of the
age ns
namespace. When
age
ends, the
age ns
namespace goes away, and the higher namespace
takes over. The
health
element, then, is a member of the
patient ns
namespace.
<?xml version="1.0"?>
<patient xmlns="patient ns">
<name>Brian Travis</name>
<age xmlns="age ns">
<base>16</base>
<years>29</years>
</age>
<health>excellent</health>
</patient>
Redefining the Default Namespace
The default namespace can be defined at
any level of the XML hierarchy. Redefining the default namespace
results in a new namespace for the scope of the element in
which it was declared.
It's really as simple as that.
You can only have one default namespace on
an element.
There is another form of the XML namespace
declaration that can be used if you have many different types
of namespaces on a single element. In this case, you need to
create namespace declarations and assign them a namespace prefix.
This is done using the following syntax:
xmlns:p="patient ns"
Notice the
:p
attached to
xmlns
. We are declaring a namespace prefix called
p:patient
, which points to this namespace. Now,
whenever we want to refer to elements or attributes that are
members of the
patient ns
namespace, we need to prefix them
with the
patient
namespace prefix.
You can have as many prefixed namespace declarations
as you like on any element in your XML document.
The document shown in
An XSD Schema is exactly equivalent to the document
in
An XSD Schema.
<?xml version="1.0"?>
<p:patient
xmlns:p="patient ns"
xmlns:a="age ns">
<p:name>Brian Travis</p:name>
<a:age>
<a:base>16</a:base>
<a:years>29</a:years>
</a:age>
<p:health>excellent</p:health>
</p:patient>
Using Namespace Prefixes
The default namespace can be overridden
by prefixing elements and attributes with a namespace prefix.
You can mix the default namespace with prefixed
namespace. The document in
An XSD Schema is exactly equivalent to the other
two.
<?xml version="1.0"?>
<patient
xmlns="patient ns"
xmlns:a="age ns">
<name>Brian Travis</name>
<a:age>
<a:base>16</a:base>
<a:years>29</a:years>
</a:age>
<health>excellent</health>
</patient>
Mixing Default Namespace and Namespace Prefix
Mixing the default namespace declaration
can be done using prefixed namespace declarations.
Mixing it the other way is also possible.
An XSD Schema is exactly the same, also.
<?xml version="1.0"?>
<patient:patient
xmlns:patient="patient ns"
xmlns="age ns">
<patient:name>Brian Travis</patient:name>
<age>
<base>16</base>
<years>29</years>
</age>
<patient:health>excellent</patient:health>
</patient:patient>
Mixing Namespaces Another Way
Mixing the default namespace declaration
can be done several different ways.
This last one might seem kind of strange.
Remember that all elements and attributes in a document that
has a default namespace declaration are members of that namespace
unless they are overridden. We have seen that there are two
ways to override the default namespace. One is by redefining
the default namespace. The other is by indicating a namespace
prefix. In
An XSD Schema, we can see that the default namespace
on the
patient
element is
age ns
. Even though we define the default namespace
on the
patient
element, that element itself is not even a member.
It is overridden using the
patient:
namespace prefix.
The same goes with
name
and
health
. The
age
element is not overridden, so it is a member of the default,
age ns
.
Now, let's get back to our weather document.
As we mentioned before, it is up to the application
to resolve the namespace declaration and find the appropriate
markup language. XRay is an application that has namespace support
built-in. If you have an XSD schema currently open in XRay,
you will see the
targetNamespace
indicated in the status line
at the bottom of the window,
as shown in
An XSD Schema.
The Target Namespace Indicator
XRay shows the name of the target namespace
in the status bar.
Now that we have identified an XSD schema
and the associated namespace, we can indicate that namespace
in our XML document by using the
xmlns
declaration on line 3.
This is
illustrated in
An XSD Schema.
Specifying the Weather Markup Language
Associating an XML document with the Weather
Markup Language namespace is done by setting the default namespace
in the root element.
Notice the status line, XRay has found the
Weather Markup Language
and is indicating that
it is being applied to this document.
It is important to note that XRay is making
this association using its built in understanding of the world
in which it lives. In other words, it is the application that
is making sense out of the namespace and applying the associated
schema.
If you use namespaces, it is up to your application
to associate a schema with a document. However, you do not necessarily
need to associate a schema with a document to make sure it is
correct. For example, your application could recognize a particular
namespace, and use that as an indicator that the document
should be processed using a set of functions that validate
the document in ways the XML parser cannot. Your application
might use the value of some element or attribute as a variable
that is placed into a database query.
The point is that a namespace does not always
indicate that a schema is to be applied to the document. In
later chapters, we will see how namespaces are used as a kind
of library inclusion.
Back, again to our weather document. Notice
that, by associating the document with an XSD schema, we have
broken it. The error that is shown in
An XSD Schema indicates that the 29th of
February is not a valid date.
Fixing the date error, we get another error,
as shown in
An XSD Schema.
Structural Error
There is a problem with the document because
it does not adhere to the structure specified by the schema.
We can see in
An XSD Schema that the XSD schema states that a
forecast has a minimum number of three days and a maximum number
of 10 days. Our document only has two days.
minOccurs and maxOccurs
XSD data boundaries indicated by minOccurs
and maxOccurs.
This is indicated with the
minOccurs
(minimum
number of occurrences) and
maxOccurs
(maximum number of
occurrences) attributes on the day element declaration.
There's a good reason for such boundary-setting.
If we are doing a weather forecast, less than three days is useless,
and if we get more than 10 days, it will be too inaccurate for
any useful purpose.
Our document has only two days, and so does
not meet the minimum number of occurrences. Fixing the error
by adding a day is shown in
An XSD Schema.
Valid Document
An XML document is valid according to the
Weather Markup Language when the error area is green.
Three important things are happening in this
document. First, the document is well-formed, as all XML documents
must be. Second, the document is valid according to the markup
language specified by the XSD schema. Lastly, the datatypes
are correct according to the XSD schema.