XML Schema Basics
- The purpose of XML Schema.
- The limitations of DTDs.
- The power of XML Schema.
- How to validate an XML Instance with an XML schema.
The Purpose of XML Schema
XML Schema is an XML-based language used to create XML-based languages and data models. An XML schema defines element and attribute names for a class of XML documents. The schema also specifies the structure that those documents must adhere to and the type of content that each element can hold.
XML documents that attempt to adhere to an XML schema are said to be instances of that schema. If they correctly adhere to the schema, then they are valid instances. This is not the same as being well formed. A well-formed XML document follows all the syntax rules of XML, but it does necessarily adhere to any particular schema. So, an XML document can be well formed without being valid, but it cannot be valid unless it is well formed.
The Power of XML Schema
You may already have some experience with DTDs. DTDs are similar to XML schemas in that they are used to create classes of XML documents. DTDs were around long before the advent of XML. They were originally created to define languages based on SGML, the parent of XML. Although DTDs are still common, XML Schema is a much more powerful language.
As a means of understanding the power of XML Schema, let's look at the limitations of DTD.
- DTDs do not have built-in datatypes.
- DTDs do not support user-derived datatypes.
- DTDs allow only limited control over cardinality (the number of occurrences of an element within its parent).
- DTDs do not support Namespaces or any simple way of reusing or importing other schemas.
A First Look
An XML schema describes the structure of an XML instance document by defining what each element must or may contain. An element is limited by its type. For example, an element of complex type can contain child elements and attributes, whereas a simple-type element can only contain text. The diagram below gives a first look at the types of XML Schema elements.
Schema authors can define their own types or use the built-in types. Throughout this course, we will refer back to this diagram as we learn to define elements. You may want to bookmark this page, so that you can easily reference it.
The following is a high-level overview of Schema types.
- Elements can be of simple type or complex type.
- Simple type elements can only contain text. They can not have child elements or attributes.
- All the built-in types are simple types (e.g, xs:string).
- Schema authors can derive simple types by restricting another simple type. For example, an email type could be derived by limiting a string to a specific pattern.
- Simple types can be atomic (e.g, strings and integers) or non-atomic (e.g, lists).
- Complex-type elements can contain child elements and attributes as well as text.
- By default, complex-type elements have complex content, meaning that they have child elements.
- Complex-type elements can be limited to having simple content, meaning they only contain text. They are different from simple type elements in that they have attributes.
- Complex types can be limited to having no content, meaning they are empty, but they have may have attributes.
- Complex types may have mixed content - a combination of text and child elements.
A Simple XML Schema
Let's take a look at a simple XML schema, which is made up of one complex type element with two child simple type elements.
Code Sample: SchemaBasics/Demos/Author.xsd
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Author"> <xs:complexType> <xs:sequence> <xs:element name="FirstName" type="xs:string" /> <xs:element name="LastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
As you can see, an XML schema is an XML document and must follow all the syntax rules of any other XML document; that is, it must be well formed. XML schemas also have to follow the rules defined in the "Schema of schemas," which defines, among other things, the structure of and element and attribute names in an XML schema.
Although it is not required, it is a common practice to use the xs qualifier (see footnote) to identify Schema elements and types.
The document element of XML schemas is xs:schema. It takes the attribute xmlns:xs with the value of http://www.w3.org/2001/XMLSchema, indicating that the document should follow the rules of XML Schema. This will be clearer after you learn about namespaces.
In this XML schema, we see a xs:element element within the xs:schema element. xs:element is used to define an element. In this case it defines the element Author as a complex type element, which contains a sequence of two elements: FirstName and LastName, both of which are of the simple type, string.
Validating an XML Instance Document
In the last section, you saw an example of a simple XML schema, which defined the structure of an Author element. The code sample below shows a valid XML instance of this XML schema.
Code Sample: SchemaBasics/Demos/MarkTwain.xml
<?xml version="1.0"?> <Author xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Author.xsd"> <FirstName>Mark</FirstName> <LastName>Twain</LastName> </Author>
This is a simple XML document. Its document element is Author, which contains two child elements: FirstName and LastName, just as the associated XML schema requires.
The xmlns:xsi attribute of the document element indicates that this XML document is an instance of an XML schema. The document is tied to a specific XML schema with the xsi:noNamespaceSchemaLocation attribute.
There are many ways to validate the XML instance. If you are using an XML authoring tool, it very likely is able to perform the validation for you. Alternatively, a couple of simple online XML Schema validator tools are listed below.
XML Schema Basics Conclusion
In this lesson of the XML tutorial, you have learned to create a very simple XML Schema and to use it to validate an XML instance document. You are now ready to learn more advanced features of XML Schema.
Qualifiers are used to distinguish between elements and attributes from different namespaces or XML classes.