What You Will Be
Able To Do Upon Completion
XHTML in the Real
1.1 The Power of
XML as Both a Data
Storage/Exchange Format and a GUI Format
1.2 Web Development
The Rise and Fall of
Separation of Code
1.3 Introduction to
What is XML
What is an XML
1.4 Parts of an XML
1.5 More on XML
1.6 Requirements of
In this unit, we will cover the basics of XML so we can learn them
before comparing XML languages to HTML and understanding how XHTML fits into the
picture in the next unit.
completion of this unit you will be able to:
- understand, at an introductory level, how XML is used to share data more
effectively between software applications
- describe XML's role in relation to the web-based technologies you are
already familiar with
- explain how XML is used as document-based middleware
- explain how XML is used as a graphical user interface format
- define elements and attributes, and describe how they
structure an XML document
Understanding XHTML 1.0 p.9-22
XHTML Foundations p.23-34
XHTML in the Real World p.35-48
Download IE6 and start opening up XML files. Use the plus and arrows to
open and close the different elements of the XML files.
XML File in Internet Explorer 6.0
1.1 The Power of XML
The Web is an interwoven "network" comprised of stored data and software
applications. Today's web sites must provide data to an increasingly wide
variety of software applications. Literally everything that takes place over the
Web involves some form of data exchange.
- providing a username and password to a remote server when logging on.
- sending and receiving e-mail.
- uploading a document to the Web.
- accessing a web page.
Of course, we are not really exchanging data directly. Software programs
continuously send and receive data on our behalf. Although we can often
understand information in an unfamiliar or unexpected format, machines cannot
start processing information until they receive the data in the exact format
they are "expecting."
XML's primary objective is to provide a standardized data
storage format capable of being "understood" and processed by all software
Traditionally, converting between proprietary file formats without losing
data is an uphill battle for a developer that must be fought again and again
every time software is replaced or upgraded. The Web's software development
cycle is continually creating new and better kinds of software. The primary
challenge for a web developer usually isn't installing and configuring the new
software, but trying to convert data from the old proprietary file storage
format into the new one. If there were a single, standardized data storage
format that could be implemented by all software applications, these conversion
“headaches” would be the exception and not the rule.
XML markup provides a standardized key for data sharing between
The requirements of software applications are in constant flux, so it only
makes sense to store your data in a format that is easiest for software to
access and process. This way, software may come and go, but your data will live
on forever. Since it will be a few years before all software is XML-enabled, the
current implementation strategy is to integrate XML documents into existing
application systems in application "layers" that enable XML data to be imported
and exported “on the way in” and “on the way out” of software applications.
Storing your content in XML means:
- You will be able to share data more easily within your own organization.
- You will be able to share data more easily with outside systems and
- Your data will never be tied to one particular OS, programming language,
development platform or software application.
- Your data can be easily integrated into the "next big thing."
XML documents can be exchanged between applications using the Web's
existing messaging framework and used in a variety of ways on the client,
server, or some combination of the two. Since XML documents are just ASCII
text-based documents (like HTML documents), transporting them over the web to
another application is a breeze.
XML documents can also be:
- stored on an HTTP server and accessed via a URL (by an end user or an
- attached to an email.
- transported over FTP or telnet.
- accessed over a file directory system at the OS level.
- Accessed over any protocol able to transport text documents.
XML is more than just a structured data storage format. XML application
development involves new ways of constructing “virtual” software applications by
connecting together existing software applications over the Web.
XML as Both
a Data Storage/Exchange Format and a GUI Format
XML makes it easier to continually expand, scale, and add new
elements to your web site, while still keeping your future options open.
In the same way that documents can have a structure, our web applications
have a structure (or "infrastructure") that's also sometimes referred to as an
"application architecture." A web site needs to continually expand and add new
features and services in order to fulfill the needs of its audience. An XML
development strategy is based on the notion of integrating "new" features into
your site by simply tapping into the existing applications of other
You want your web site's applications to be able to interface easily with
other applications that may be larger or smaller in comparison. By thinking
about software applications in terms of their smaller parts, it becomes easier
to understand the individual pieces you have to work with. Smaller software
applications can function seamlessly within a larger application's existing
infrastructure. (An HTML parser is an example of a self-contained software
application that also works as a component within the larger browser
XML documents can provide a machine-readable description of the
goods and services of a web site that simplifies the process of interfacing with
the applications and services of others.
HTML documents are used as the graphical user interface (GUI or "goo-ey")
for web-based applications to provide a "front-end" for the end-user to interact
with. This HTML "front end" is a given for any web-based application, but what
you decide to connect together on the back end is up to you. XML can be used to
connect together a host of features and services on the "back-end."
Instead of re-inventing the wheel every time you want to add a service or a
feature to your site, you can take advantage of the work and experience of
others by connecting directly to their service from your own front end. In other
words, your HTML-based front-end might connect to a back-end service that
actually "lives" on a server other than your own. There's no way for an end user
to know where the features of your website actually "live." This technique is
useful because it enables you to build upon the work of others by incorporating
existing services into your own front end, rather than writing all of your own
features from scratch.
1.2 Web Development Then and
NowThe Rise and Fall
of Dynamic HTML
In the early days of the Web, developers were at the mercy of the browser
companies to incorporate features that could be used on their web pages. What
started as browser "innovations" often led to some kind of non-standard HTML
"feature creep." We learned many lessons during this formative period, but most
of the little lessons were part of the same big one: browser-specific
applications are unpredictable and ultimately useless. We don't want to
develop browser-specific applications anymore. We want the “mom and pop” that
just dialed up using that old AOL version 3.0 disk to still be able to buy stuff
on our web site. One of the important “givens” of the Web is that it is
impossible to predict what brand and version of browser your web-site visitors
will be using.
"Browser detection scripts" are often employed that use a server-based
scripting language to detect what version was in use and then serve the
appropriate HTML page down to the end user. However, even if you can detect a
browser type successfully, maintaining two or more versions of your HTML content
Another shortcoming of this method is that there is a greater potential for
errors to be introduced, since both the “content” people and the “programming”
people will be editing the files, and there is more than one version of each
file. The content people, in the course of their work, could easily erase a
could accidentally erase a sentence and disembody the content.
At this point, there are so many different versions of the three major
browsers (IE, Netscape, and AOL) that even a browser-detection script provides
an application with only an educated guess about the brand and version of the
end-user's browser. Most all brands and versions of browser are able to process
Separation of Code from Data
As an XML developer, it is important to develop the ability to isolate your
data from your code whenever possible and keep this separation in mind when you
are designing the structure of your applications. When the presentation and
processing of information are embedded within the same document it becomes more
difficult to manipulate that data on its own.
The first step to developing useful XML software applications is
to modularize your software application into its smaller, logical
The outcome of the process of breaking every software process into its
smaller components, or modules, is that you will isolate your data from
the code used to access and process it.
The methods used to store, access, and process your data don't necessarily
have to depend on each other. This is a far cry from only a few years ago when
processing data intelligently meant depending on expensive proprietary software
and storage formats, and requiring your business partners to do the same. Before
you go on, review the following table, which summarizes what we've learned so
far about how web sites and applications used to be designed and how we want to
design them now, keeping our new XML perspective in mind.
Web Development Then and Now
Browser-specific web sites
No Browser-specific web sites
Dynamic HTML (Non-accessible web sites)
No Dynamic HTML
Code (content and presentation information)
intermingled with data (stored content)
Data kept separate from its presentation and processing
code (externalized into "modules")
Dedicated software on the Client/Server
No dedicated client/server applications: develop only
for a "web-based" HTML front-end
Now that we've learned what XML can do, let's start to learn how to use
1.3 Introduction to XML
What is XML
- All XML markup actually "does" is provide a simple format for naming and
structuring text-based data.
- XML documents are made up of character data (content) and markup (code).
- There are five "special characters" (<, >, &, ' and ") that an
XML parser will interpret to be "markup" rather than character data.
- The angle brackets inform the parser which characters constitute
structural markup (code) and which constitute character data that should be
XML documents can be used to represent any type of text
Let's think for a minute about the kinds of documents we are used to seeing
everyday: written letters, books, pamphlets, newspapers, magazines, etc.
Conceptually, many of these paper-based documents transfer very easily to their
digital counterparts, and any digital text document can be generated using
1.4 Parts of an XML Document
Every XML document has a data model comprised of the elements
and attributes that are required or allowed to structure its content
(character data). We'll take a look at those elements and attributes in this
lecture topic. In the same vein as the data model, each element has a content
model made up of the elements and attributes that a particular element is
allowed to contain. Don't worry if you don't understand this structure yet; read
Elements are the logical components of XML documents. When all of our
documents are abstracted into smaller parts, we can manipulate their content
from whichever perspective we require. The smaller parts of our larger documents
can be represented in XML using "elements." A "header" element, for example,
could be used to group together the "to", "from" and "subject" elements of an
Elements are one of the most commonly used types of markup: the bracketed
items that are often referred to as "tags." Elements consist of words that serve
as the "names" for your element "tags" and are surrounded on either side by
"less than" (<) and "greater than" (>) characters. These start and end
tags may be used to encapsulate character data (text), as in the following
<summary>Text goes in
Besides character data, an element may also be made up of
subelements. In the graphic below, the "book" element's content model
consists of the "summary" subelement, while the "summary" element's model
contains no subelements, only character data.
Element content vs. character data
Attributes provide a means of assigning "extra" information to elements in
order to further describe properties of those elements.
Attribute-value pairs can be associated with elements by including them
inside of an element's start tag.
An attribute-value pair used within
the "book" element's start-tag.
If an element contains no subelements or character data, that element is
said to be "empty." In most cases, an empty element will contain an
attribute-value pair inside of a single tag that is "terminated" by a forward
slash before its closing bracket. The slash before the ending bracket serves the
same function as an end tag's forward slash.
An element containing nothing more than an attribute is still considered
"empty" and "without content" because attribute values count as markup not
Technically, an empty element can
also be expressed using element start and end tags.
1.5 More on XML
Unlike HTML, elements in XML must have both starting and ending tags, and
its markup must be nested properly. Nesting refers to placing the
contents of an element inside another element ("nested" subelements). This means
that a subelement's end tag must occur before that of its parent
"Child" subelement tags cannot "overlap" with those of their parent
elements. The example below would produce an error because the <title>
subelement's closing tag occurs after the closing tag of its <book>
Caution! XML syntax is case-sensitive. Both of the examples below would
qualify as variations on the string "n-a-m-e" and would be interpreted as unique
by an XML parser, triggering an error.
- The very first element of a document is known as the root element.
- The root element is the top-level element in the XML document hierarchy.
- The root element contains all other elements. Each document can have only
one root and all other elements must be nested within it.
- For instance, the code below would produce a parsing error because an XML
parser would think that the first <book> element was the root element -
it was the first element it came across.
- When the parser recognized a second occurrence of a <book> element,
it would be able to determine that the document's elements were not nested
Non-root elements (or subelements) may appear as many times as desired, as
long as they are properly nested within the document's structure. Subelements
are said to be "children" of the "parent" elements they are nested within. The
logical structure of a document's components can also be represented by a tree
- the tree trunk is the root element
- the branches are subelements that contain other subelements
- the leaves are subelements that don't contain other subelements]
In XML, white space is not automatically collapsed into a single space as
is the case in HTML. In some cases, white space is permitted but may cause
confusion. For example, a white space outside the tags might be interpreted as a
XML only allows white space in specific locations within a starting or
ending element. White space is not allowed before element names in either the
starting or ending tags.
Example 1 below would produce an error (because whitecap is not allowed in
the beginning of an element name), but Example 2 would not (because an unlimited
amount of white space is allowed between an element name and an attribute
If you are just starting out with XML, it's best not to get too fancy with
your use of white space.
1.6 Requirements of
Well-formed DocumentsXML documents are structured specifically to be
reshaped and re-purposed on-demand. For this reason, it is very important for
the beginning and ending of each piece of data contained within an XML document
to be clearly-defined.
An XML document's syntax must be well-formed, so its separate
pieces can be easily recognized by an XML parser.
XML's "well-formedness" requirements are:
- All starting elements must have ending elements.
- All elements must be "nested" properly.
- Attribute values must be properly quoted.
- Empty elements must be properly "terminated."
- Only one root element.
The code fragment below provides examples of two of XML's "typical"
well-formedness violations: misnested elements (in red and purple) and misquoted
attribute values (in red).
|<person><name type="customer"> Tom Jones
<phone> 555-1234</name> </phone></person>|
elements/Properly-quoted attribute values
|<person><name type="customer"> Tom Jones