Abstract:This article is a basic introduction to the new web markup
language XML and the transformation language XSL. Here I show
how the Apache web server can be configured using the servlet engine
JServ, to do client side XML/XSL transformation using Apache's Cocoon servlet.
Future updates for this article will be located at http://www.inconn.ie/article/cocoon.htm
(The domain name is currently non-functional but is expected soon.)
The eXtensible Markup Language
(XML) is a powerful new web markup language (ISO approval in February 1999). It is a powerful way of separating web content and
style. A lot has been written about XML, but to be used effectively in web design the technologies behind it must be understood. To this end I have added my own two pence worth to the already
vast amount of literature out there on the subject. This article is not
a place to learn XML, nor is it a place where the capabilities of XML are
explored to their fullest, but is is a place where the technologies behind XML can be
put in practice immediately.
Before I go any further, I should recommend the two sites where
definitive information on XML can be obtained. The first is the World
Wide Web Consortium (W3C) site http://www.w3.org/. The W3C are responsible for the XML specification. The second site is the
XML frequency asked
questions site (http://www.ucc.ie/xml/)
which will answer any other questions. I also recommend the XML
pages hosted by IBM,
where you will find a wide range of excellent tutorials and articles on XML.
The original web language, SGML (around since 1986) is the mother of all mark-up
languages. SGML can be used to document any
conceivable system; from complex aeronautical design to ancient Chinese
dialects. However, it
suffers from being over complex and unwieldy for routine web
applications. HTML is basically a very cut down version
of SGML, originally designed with the scientific publishing community
in mind. It is a
simple mark-up language (it has been said "anyone with a pulse
can learn it") and with the explosion of the web it is clear that the people with pulses have spoken. Since its foundation the web has
grown in complexity and it has long outgrown its lowly beginning in the
Today web pages need to be dynamic, interactive,
back-ended with databases, secure and eye catching to compete in an ever
more crowded cyberspace. Enter XML, a new mark-up language to deal
with the complexities of modern web design. XML is only 20 percent as
complex as SGML and can handle 80 percent of SGML situations (believe me
when you are talking about coding ancient Chinese dialects, 80 percent
is plenty). In the following section I will will briefly compare two markup examples, one in HTML and the second is XML, demonstrating the benefits of an XML approach. In the final section I will show you
how to set up an Apache web server to serve an XML document so
that you may begin immediately to start using XML in your web design.
The following example is a very simple HTML document that everyone will be familiar with:
is my article</title>
align="center">This is my article</h1>
Two important points can be made about this document.
XML addresses these two issues.
- The content and style are tied together in the document.
- It would be very difficulty for a search program to search
this document and extract the mail address of Eoin lane.
The XML equivalent is as follows:
<title>This is my article</title>
The first thing to note is that this document, along with all
other valid XML
documents, is well formed. To be a well formed
document every tag must have an open and close brace. A program
searching for the mail address then has only to locate the text in between
the opening and closing tags of mail.
The second and crucial point is that this XML document contains just data. There is nothing in this document
that dictates how to display the author's name or his mail address. In practice it is easier to
think about web design in terms of data and presentation separately. In
the design of medium to large web sites, where all the pages have the
same look and only the data is changing form page to page, this is
clearly a better solution. Also it allows a division of labour where, style and content can be
handled by two different departments, working independently. It also allows the possibility of having one set of data with a number of ways of presenting
An XML document can be presented using two different methods. One is using a Cascading Style Sheet (CSS) (see http://www.w3.org/style/css/) to markup up the text in
HTML. The second is using a transformation language called XSL, which
converts the XML document into HTML, XML, pdf, ps, or
Latex. As to which one to use, the W3C (the people responsible for these specification) has this to say:
Use CSS when
you can, use XSL when you must. They go on to say:
The reason is that CSS is much easier to use, easier to
learn, thus easier to maintain and cheaper. There are WYSIWYG editors
for CSS and in general there are more tools for CSS than for XSL. But
CSS's simplicity means it has its limitations. Some things you cannot
do with CSS, or with CSS alone. Then you need XSL, or at least the
transformation part of XSL.
So what are the things you cannot do with CSS? In general everything that needs transformations. For example, if
you have a list and want it displayed in lexicographical order, or if
words have to be replaced by other words, or if empty elements have to
be replaced by text. CSS can do some text generation, but only for generating small things, such as numbers of section headers.
XSL (eXtensible Stylesheet
the language used to transform and display XML documents. It is not yet finished so
beware! It is a complex document formating
language that is itself an XML document. It can be further subdivided
in two parts: transformation (XSLT) and formatting objects (sometimes
referred to as FO, XSL:FO or simply XSL). For the sake of simplicity I
will only deal with XSLT here.
XSL Transformations (XSLT)
As of the 16th of November 1999 the World Wide Web Consortium
has announced the publication of XSLT as a W3C Recommendation. This
basically means that XSLT is stable and will not change in the
future. The above XML document can be transformed into a HTML document and
subsequently displayed on any browser using the following XSLT file.
To learn more about XSLT, I recommend the XSLINFO site
as a good starting point. Also I found the revised Chapter 14 from the
to be very good. This revision is based on the specifications that
eventually became the recommendation.
With the arrival of the next generation of browsers,
i.e. Netscape 5 (currently under construction http://www.mozilla.org/)
this transformation with be done client side. When an XML
file is requested the
corresponding XSL file will be sent along with it, and the transformation will be done by
the browser. Currently there are a lot of browsers only capable of
displaying HTML, and until then the transformation must be done server
side. This can be accomplished by using Java
servlets (Java server side programs).
The Cocoon servlet is such a servlet, written by some very clever people at Apache (http://www.apache.org/). It basically takes
an XML document and transforms it using a XSL document. An example of
such a transformation would be to convert the XML document into HTML
so that the browser can display it. So if your web
server is configured to run servlets, and you include the cocoon servlet, then you can start designing your web pages using XML. The rest of this article will show exactly how to do this.
How do I do it?
I have tested the following instructions on a fresh installation of Red Hat 6.0, so I know it works.
Apache Web Server
First set up the Apache web server. On Red Hat this comes
pre installed but I want you to blow it away using:
-e --nodeps apache and do not worry about the error
messages. Next get a hold of the most recent Apache (http://www.apache.org/) (currently verison 1.3.9) and copy it somewhere handy. I put mine in
/usr/local/src. Tar and unzip the file using:
tar zxvf apache_1.3.9.tar.gz This will
expand the installation into the directory
/usr/local/src/apache_1.3.9. Change into this directory
and configure, build and install the application using the
./configure --prefix=/usr/local/apache --mandir=/usr/local/man --enable-shared=max
This will install apache into the directory
/usr/local/apache and the important file to note here is
http.conf which can be found in the directory
/usr/local/apache/conf. This file contains most of the
important information necessary to run apache correctly. It contains
information on: where to serve the web documentsfrom, virtual web
servers and folder aliases. We will be returning to this file shortly so become familiar with it's general
layout. At this stage I had to reboot Linux and then start Apache using the following
To test it, point your web browser to http://localhost/ and
you're in business, hopefully!
For good web design and planning I would refer you to an article that
I found invaluable in setting up my own web site: Better Web Site
Java and JSDK
As of October, IBM have released the Java Development Kit 1.1.8 for
Linux. It claims to be faster than the corresponding Blackdown's
and Sun's JDKs. Download IBM JDK (see
Again tar and unzip this into the
/usr/local/src/jdk118 directory. Next, download the
JavaSoft's JSDK2.0, the solaris version (not JSDK2.1 or any other flavours you might be
tempted to get) and tar and unzip it - again I put it in
/usr/local/src/JSDK2.0. Add the following or equivalent
to /etc/profile to make them available to your system.
To test them
export PATH CLASSPATH JAVA_HOME JSDK_HOME
java -version at the command prompt, and you should get back the
java version "1.1.8" and to test the servlet development kit run:
servletrunner and if all goes well you
should get back the following:
servletrunner starting with settings:
We are now ready to install Apache's servlet engine, ApacheJServ.
port = 8080
backlog = 50
max handlers = 100
timeout = 5000
servlet dir = ./examples
document dir = ./examples
servlet propfile =
Again, download the latest ApacheJServ (version 1.0 at this time,
although version 1.1 is in it's final beta stage) from Apache's Java Site
and expand it into /usr/local/src/ApacheJServ-1.0/. Configure, make and
install it using the following instructions:
When this has successfully completed add the following line to the end
of the http.conf file that I refereed to earlier during the Apache web
Include /usr/local/src/ApacheJServ-1.0/example/jserv.confand restart the web server using:
/usr/local/apache/bin/apachectl restart Now
comes the moment of truth, point your web browser to
and if you get back the following two lines:
Example Apache JServ Servlet
then you are almost home.
Congratulations, Apache JServ is working!
Finally, download the latest version of Cocoon (version 1.5 at this time) from Apache's Java Site
Cocoon is distributed as a Java jar file and can be extracted using the command
jar. First, create the directory
/usr/local/src/cocoon and then expand the cocoon jar file
jar -xvf Cocoon_1.5.jar
Now comes the tricky part of
configuring the JServ engine to recognise a file with a
.xml extension and to use the cocoon servlet process and
Locate the file jserv.properties which you will find in the
directory /usr/local/src/ApacheJServ-1.0/example/ and at
the end of the section that begins:
# CLASSPATH environment
value passed to the JVM add the following:
In the case of Cocoon 1.5 this means adding the following three lines:
Although these files will change with different versions. The next file to locate is the example.properties file,
again found in the /usr/local/src/ApacheJServ-1.0/example/
directory and add the following line:
In my example.properties file it meant changing the line:
to the following:
Also add the following line to the end of the
The JServ engine is now properly configured and all that is left
for us to do it to tell Apache to direct any call to an XML file (or
any other file you want Cocoon to process) to the Cocoon servlet. For
this we need the JServ configuration file,
jserv.conf mentioned earlier (again in the same directory). Include the following line:
In order to
access the cocoon documentation and examples add the following lines to
the alias section of
your http.conf file:
Alias /xml/ "/usr/local/src/cocoon/"
Allow from all
Alias /xml/ example/"/usr/local/src/cocoon/example/"
Allow from all
Restart the web browser for this to take effect:
Now point your browser to http://localhost/xml/
to browse the documentation and
to try out the examples. If Cocoon complains about a exceeding a memory limit then open the file cocoon.properties found in the /usr/local/src/cocoon/ directory. Find the line
store.memory = 150000and change it to something lower like 15000. To try out the PDF examples, which I think
are very cool, you have to have Acrobat Reader installed as a
netscape plug-in, but it is worth the extra effort to get this
The Cocoon 1.x series has basically been a work in progress.
What started out as a simple servlet for static XSL transformation has grown into
something much more. With this ongoing development, design
considerations taken at the beginning of the project are now hampering future
developments as the scale and the scope of the project becomes
apparent. To add to this, XSL is also a work in progress,
although the current version of XSLT has become a W3C Recommendation (as of November, 16 1999).
Cocoon 2 intends to address these issues and provide us with a
servlet for XML transformations that is scalable to handle large quantities
of web traffic. Web design of medium to large sites in the
future will be based entirely around XML, as its benefit become apparent, and the Cocoon 2 servlet will hopefully provide us with a way to use it effectively.
Even as I have
been writing this article, Apache have opened a new site dedicated exclusively to
The cocoon project has obviously grown beyond all expectations, and with
the coming of Cocoon 2 will be a commercially viable servlet to
enable design of web sites in XML to become a reality. The people at
Apache deserve a lot of credit for this so write to them and thank them,
join the mailing list and generally lend your support. After
all this is open source code and this is what Linux is all about.
- An XML version of this article.
- Its XSL style sheet.
- A text version of the XML source.
- A text version of the XSL source.
Copyright © 1999, Eoin Lane
Published in Issue 48 of Linux Gazette, December 1999