Well-Formedness
 | The W3C calls for XML documents to be "well-formed". |
 | This differs from HTML in that rules are often stretched or broken (e.g.,
no closing tags, tag name misspellings, etc.), but the document may be still
be displayed (as much as possible). |
 | XML documents must have:
 | A root element |
 | Case sensitivity |
 | Opening and closing tags |
 | Elements must be properly nested |
 | All attribute values must be quoted |
 | Empty elements must be closed. |
|
Consider the following example...
<?xml version="1.0"?>
<letter>
<to>John</to>
<from>Bob</from>
<subject>Something important</subject>
<body>See me in the morning for the next assignment...</body>
</letter> |
The first line in the document defines the XML version of the document.
This document
conforms to the 1.0 specification of XML...
<?xml version="1.0"?>
The next line defines the first element of the document called the root
element...
<letter>
The next lines defines 4 child elements of the root
<to>John</to>
<from>Bob</from>
<subject>Something important</subject>
<body>See me in the morning for the next
assignment...</body>
The last line defines the end of the root element.
</letter>
|
The Structure of XML...
All XML documents must have a root tag
 | All XML documents must contain a single tag pair to define the root element. |
 | All other elements must be nested within the root element. |
 | All elements can have
sub (children) elements. |
 | Sub elements must be in pairs and correctly nested
within their parent element: |
<root>
<child>
<subchild>
</subchild>
</child>
</root>
All XML elements must have a closing tag
 | In HTML some elements do not have to have a closing tag. |
 | Therefore, the following code
is legal in HTML: |
<p>This is a paragraph
<p>This is another paragraph
 | In XML all elements must have a closing tag like this: |
<p>This is a paragraph</p>
<p>This is another paragraph</p>
XML tags are case sensitive
 | XML tags are case sensitive. |
 | For example, the tag <Letter> is different from the tag
<letter>. |
 | Opening and closing tags must therefore be written with the same case: |
<Message>This is incorrect</message>
<message>This is correct</message>
Predefined Entities in XML
 | There are five pre-defined character entities for XML: |
| < |
< |
| > |
> |
| & (ampersand) |
& |
| ' (apostrophe) |
' |
| " (double quote) |
" |
 | Depending on the parser, the apostrophe and double quote will generally be
presented without problems, but technically, should be defined by character
entities. |
Other numeric character references
 | Unicode character codes which includes the ASCII character codes
(e.g., "A" is ASCII code 65 or A) |
 | This also includes characters not on the keyboard:
 | © - ¢ |
 | £ - £ |
 | ® - ® |
|
 | Click here for a
Unicode
numeric character reference chart. |
All XML elements must be properly nested
 | In HTML some elements can be improperly nested within each other like this: |
<b><i>This text is bold and italic</b></i>
 | In XML all elements must be properly nested within each other like this |
<b><i>This text is bold and italic</i></b>
Comments in an XML document
Uses the same commenting scheme as HTML...
<!-- comment goes here -->
or multi-line
<!--
a comment can go here
or here
-->
Attribute values must always be quoted
 | XML elements can have attributes in name/value pairs just like in HTML. |
 | The attribute value must always be quoted. |
 | Consider the two XML documents
below. |
This one is incorrect:
<?xml version="1.0"?>
<letter date=3/27/01>
<to>John</to>
<from>Bob</from>
<subject>Something important</subject>
<body>See me in the morning for the next
assignment...</body>
</letter>
|
This one is correct:
<?xml version="1.0"?>
<letter date="3/27/01">
<to>John</to>
<from>Bob</from>
<subject>Something important</subject>
<body>See me in the morning for the next
assignment...</body>
</letter>
|
Non-Standard Text
 | There are situations where it is not desirable to parse and display
content when the document is presented. |
 | Non-standard text that is not parsed can be included in a document using
the following syntax: |
<?xml version="1.0"?>
<letter>
<to>John</to>
<from>Bob</from>
<subject>Something important</subject>
<![CDATA[
<head><title>Title for this Document Goes Here</title></head>
<body bgcolor="blue">
<h1>This is a Header 1 in HTML</h1>
<p>this is text in the document
</body>
]]>
</letter> |
Namespaces
 | Namespaces are used to minimize confusion with common element names
(e.g., <name>). |
 | Namespaces typically reference unique uniform reference identifier (URI)
-- these references are sometimes "made up" references or documents. |
 | The point is to create a unique and separating reference. |
 | Must start with a letter or underscore and contain only letters,
underscores, digits, hyphens, and periods). |
 | The syntax for creating a namespace follows: |
<namespace:elementname xmlns:namespace="globallyUniqueURI">
XML Attributes
XML attributes are used to describe XML elements, or to provide
additional information about elements.
 | Consider the following HTML: <IMG SRC="davepic.gif">. |
 | SRC is
an attribute to the IMG element. |
 | The SRC attribute provides additional
information about the element. |
Attributes are always contained within the start tag of an element.
Use of Elements vs. Attributes
Take a look at these examples:
Using an Attribute for sex:
<person sex="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
Using an Element for sex:
<person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
 | In the first example sex is an attribute. In the last example sex is an
element. Both examples provides the same information to the reader. |
 | Consider:
 | There are no fixed rules about when to use attributes to describe data, and
when to use elements. |
 | A good general rule is...with XML, you should try to avoid them, as long as the same information
can be expressed using elements. |
|
 | Here is another example, demonstrating how elements can be used instead of
attributes. The following three XML documents contain exactly the same
information: |
|
A date attribute is used...
<?xml version="1.0"?>
<letter date="3/27/01">
<to>John</to>
<from>Bob</from>
<subject>Something important</subject>
<body>See me in the morning...</body>
</letter>
|
A date element is used...
<?xml version="1.0"?>
<letter>
<date>3/27/01</date>
<to>John</to>
<from>Bob</from>
<subject>Something important</subject>
<body>See me in the morning...</body>
</letter>
|
An expanded date element is used...
<?xml version="1.0"?>
<letter>
<date>
<day>3</day>
<month>27</month>
<year>01</year>
</date>
<to>John</to>
<from>Bob</from>
<subject>Something important</subject>
<body>See me in the morning...</body>
</letter>
|
Reasons to avoid attributes:
 | Attributes can not contain multiple values (elements can) |
 | Attributes are not expandable (for future changes) |
 | Attributes can not describe structures (like child elements can) |
 | Attributes are more difficult to manipulate by program code |
 | Attribute values are not easy to test against a DTD |
Like everything, there are exceptions to the attribute rules..
 | You may want to assign ID references to elements in my XML documents. |
 | ID
references can be used to access XML element in much the same way as the NAME or
ID attributes in HTML. |
 | This example demonstrates this: |
An expanded date element is used...
<?xml version="1.0"?>
<letter ref="222">
<to>John</to>
<from>Bob</from>
<subject>Something important</subject>
<body>See me in the morning for the next assignment...</body>
</letter> <?xml version="1.0"?>
<letter ref="223">
<to>George</to>
<from>Bob</from>
<subject>Something else important</subject>
<body>Yadda, yadda...</body>
</letter>
 | The "ref" in these examples is just a counter, or a unique identifier, to identify
the different notes in the XML file. |
Some Examples
|