Click on this banner to see the site home page.

Index | Home | SiteMap | Updates | Email Comments

An Introduction To HTML

v1.0.0 / 01 jun 03 / greg goebel / public domain

* The "World-Wide Web (WWW)" revolutionized the Internet by creating a much easier way to obtain information over the network, linking vast numbers of "web sites" and "web pages" that could be easily searched and inspected.

The basis of the Web is the formatting language used to create web pages, the "Hypertext Markup Language (HTML)". HTML is not only used to format documents, but also to control their links and operation. This document provides a short introduction to HTML.

[1] INTRODUCTION / ELEMENTARY TAGS
[2] GRAPHICS
[3] HYPERLINKS
[4] FILE SYSTEMS
[5] MARKERS / CLICKABLE BITMAPS
[6] BELLS & WHISTLES / META TAGS
[7] PUTTING IT TOGETHER
[8] ADVANCED FEATURES
[9] COMMENTS, SOURCES, & REVISION HISTORY

[1] INTRODUCTION / ELEMENTARY TAGS

* HTML provides a set of "tags" that are embedded in documents to be displayed by a Web browser. These tags provide instructions on how a Web browser will format and otherwise deal with a document displayed in the browser. Typical HTML code might have the form:

   <TITLE>This Is The Document Title</TITLE>

   <H1>[1.0] Introductory Topics</H1>

   <PRE>
   Twas Bryllig, and the slithy toves
   Did gyre and gimbel in the wabe.
   All mimsy were the borogoves,
   And the mome raths outgrabe.
   </PRE>

   <HR>

The items in the angle brackets ("<>") are the tags. HTML includes a large number of tags, but we can start with a simple subset, listing them and then explaining them in detail:

   <HTML>...</HTML>       Defines an HTML document (optional).
   <HEAD>...</HEAD>       Defines head of document (optional).
   <BODY>...</BODY>       Defines body of document (optional).

   <TITLE>...</TITLE>     Title.
   <H1>...</H1>:          First-level header.
   <H2>...</H2>:          Second-level header.
   <P>                    Paragraph break.
   <PRE>...</PRE>:        Preformatted text block.

   <UL>...</UL>           Unnumbered list.
   <LI>                   List item.

   <EM>...</EM>           Emphasis (usually italicized).
   <STRONG>...</STRONG>   Strong emphasis (usually bold).
   <CITE>...</CITE>       Citations.

   <BR>                   Forced line break.
   <HR>                   Horizontal rule.

The first three tags actually have no visible effect:

   <HTML> ... </HTML>
   <HEAD> ... </HEAD>
   <BODY> ... </BODY>

The "<HTML>" tag designates an HTML-formatted document, while the "<HEAD>" tag designates the heading information in the text and the "<BODY>" tag designates the actual formatted text:

   <HTML>
   <HEAD>
      TITLE
   </HEAD>
   <BODY> 
      Text to format and display.
   </BODY>
   </HTML>

Web browsers can actually often handle HTML code that doesn't use these tags, but their use is recommended to avoid unpleasant surprises.

The remaining tags in the list are more interesting. First, there is the "<TITLE>" tag, which simply declares the title of the document:

   <TITLE>Coyote's Website</TITLE>

The text defining the title, "Coyote's Website", doesn't actually show up in the web page display itself, but it's shown in the browser's title bar and, if a surfer makes a "bookmark" to that page, will be the text defining the bookmark.

The actual title of the web page that the web browser displays can be generated by the "<H1>" tag, which defines a first-level header:

   <H1>Welcome To Coyote's Website</H1>

It is easy to confuse the title and first-level header. The difference is that the title doesn't show up in the document, just in the browser's title bar, while the first-level header only shows up in the text. It is easy just to make the title and the first-level header the same.

If there are any sub-headings with the text of the document to be displayed, they can be generated with the "<H2>" tag, which defines a second-level header:

   <H2>Just Who Is This Coyote Guy, Anyway?</H2>

These headers will be displayed in appropriate bold text (with "appropriate" determined by the web browser being used to display the HTML document and the browser's settings). Lower-level headers can be defined with the tags "<H3>", "<H4>", and so on, but it is not generally necessary to go to lower levels.

Text in the body of the document will be "filled" to the margins of the web browser display. Spacing and blank lines between paragraphs will be ignored, and so paragraphs need to be broken apart with the tag:

<P>

Notice that this tag doesn't require a matching "</P>" tag. Only one tag is needed to define the start of a new paragraph.

If the author wants text to be printed "as is", without filling, the "<PRE>" (preformatted block) tag must be used instead. For example, if the following text were to be filled, it would all be scrunched up on one line and would look terrible, so it is marked as preformatted text:

   <PRE>
   Name:            Wyle Coyote
   Occupation:      Scavenger/Predator
   Marital Status:  Single
   Income:          Negligible
   </PRE>

* A simple formatted web page can be built with these tags, though one warning should be heeded: there are certain characters that are recognized as parts of commands and so on by HTML, and if these characters exist in the original text, they will confuse the web browser and mangle the display of the source file.

This problem can be avoided by replacing these characters with "escape" codes. The characters of concern and their escapes are as follows:

   "<" =  &lt;
   ">" =  &gt;
   "&" =  &amp;
   '"' =  &quot;

It is simple to write a small C program filter out these characters, but any reasonable editor should provide search-and-replace facilities to do the job. Anyway, having given this warning, a simple formatted web page can be built:

   <HTML>
   <HEAD>
   <TITLE>COYOTE'S PAGE</TITLE>
   </HEAD>
   <BODY>
   <H1>Welcome To Coyote's Web Page</H1>

   Welcome to Wyle Coyote's web page!  Despite my busy career as a
   predator and certified Genius, I have had time to establish a 
   presence on the Internet, and am now on-line for Coyote lovers everywhere!

   <H2>Just Who Is This Coyote Guy, Anyway?</H2>

   Allow me to introduce myself.  I am Wyle Coyote, Esquire, IQ 200,
   graduate class of 1950 from the Southwest Technical Institute.  I have
   a degree in Advanced Predation and specialize in trap technologies.
   <P>
   Being a predator is a challenging career.  Attempting to capture 
   clever and fleet-footed game is by no means simple and in fact can 
   present significant hazards to the unwary.  However, being a Genius I
   find that the difficulties involved only make the game more interesting.

   <H2>Personal Statistics</H2>

   And now for the dry personal statistics, for those of you inclined to
   worry about minor details:
   <PRE>
      Name:            Wyle Coyote
      Occupation:      Scavenger/Predator
      Marital Status:  Single
      Income:          Negligible
   </PRE>
   If you have questions, please feel free to email me at:
   <PRE>
      coyote@acme.com
   </PRE>
   </BODY>
   </HTML>

This document would be saved in the file, say, "test.html", which could then be accessed by a web browser for display. The ".html" extension tells the web browser that the file should be displayed as HTML-formatted text. The extension ".htm" is also valid.

* Given the ability to make elementary web pages, the next step is to add other formatting capabilities. The first new formatting capability is the "list".

HTML defines various list formats, but the simplest is the unnumbered list, defined by the "<UL>" tag. Each item in the list is defined by the "<LI>" tag. For example:

   <UL>
      <LI> black
      <LI> white
      <LI> green
      <LI> red
      <LI> blue
   </UL>

-- results in:

   o black
   o white
   o green
   o red
   o blue

Note that the indention shown in the source file is just to make it easier to read. In practice, the browser displays it by its own rules. Lists can be nested within in each other, though more than three levels of nesting is clumsy. For example:

   <UL>
       <LI> Animal:
           <UL>
               <LI> hummingbird
               <LI> housecat
               <LI> whale
           </UL>
       <LI> Mineral:
           <UL>
               <LI> coal
               <LI> diamond
               <LI> steel
           </UL>
       <LI> Vegetable:
           <UL>
               <LI> strawberries
               <LI> apples
               <LI> corn
           </UL> 
   </UL>

This might be displayed something like:

  o Animal:
    * hummingbird
    * housecat
    * whale
  o Mineral:
    * coal
    * diamond
    * steel
  o Vegetable:
    * strawberries
    * apples
    * corn

There are other types of lists defined in HTML. For example, a numbered list can be defined by replacing the tag <UL> with the tag "<OL>" (ordered list). However, the unnumbered list is adequate for most purposes.

Another common need in HTML documents is text styling, such as bold, italic, underlined, and so on. There are a lot of different tags to define text styles, but three are sufficient for most needs: "<EM>", for emphatic text (usually italic); "<STRONG>", for strong emphasis (normally bold); and "<CITE>", for citations. For example:

   This example demonstrates <EM>emphatic text</EM>.
   This example demonstrates <STRONG>bold text</STRONG>. 
   A citation:  <CITE>WAR AND PEACE</CITE>

Finally, if adding additional spaces between paragraphs is desired, the "<BR>" (line break) tag can be used. If a horizontal line between paragraphs or whatever is desired, the "<HR>" tag can be used.

* For an example that puts these items together:

   <HTML>
   <HEAD>
   <TITLE>ACME INC</TITLE>
   </HEAD>
   <BODY>
   <H1>Acme Corporation</H1>

   * The Acme Corporation of Albuquerque, New Mexico, is a for-profit concern
   that focuses on a wide range of <EM>reliable, safe, and well-built</EM>
   products.  Acme is a people-oriented concern that provides an excellent
   work environment and prospects for advancement.

   <H2>Acme Products</H2>

   * Acme offers an extensive list of useful products:
   <UL>
      <LI> Jet Skates
      <LI> Anvils
      <LI> Catapults
      <LI> Bat-Man Suits
      <LI> Explosive Devices:
         <UL>
            <LI> Grenades
            <LI> Land Mines
            <LI> Dynamite
         </UL>
   </UL>

   <H2>For Further Information</H2>

   * Please contact Acme at:
   <PRE>
      Acme INC
      1948 Roadrunner Drive
      Albuquerque, NEW MEXICO, USA 87109
      Phone:  505-743-9710
      Fax:    505-743-9921
   </PRE>
   <HR>
   For product support information:  <STRONG>support@acme.com</STRONG>
   <HR>
   </BODY>
   </HTML>

* A caution: don't scramble the order of tags. For example, something like:

   <TITLE> ... <PRE> ... </TITLE> ... </PRE>

-- isn't likely to work very well. This is an absurd example, but most such attempts to interlace tags are equally absurd. Also, tags can't in general be nested, with the exception of text-formatting tags, as well as embedded graphics and anchor tags (see below).

BACK_TO_TOP

[2] GRAPHICS

* It is easy to insert bitmap image files in a web page. These files should generally be in the popular .GIF or .JPEG format, and can be specified with the "<IMG>" image tag:

   <IMG SRC="somegfx.gif">

By default, the bottom of an image is aligned with the text. The "ALIGN=TOP" option aligns adjacent text with the top of the image:

   <IMG ALIGN=TOP SRC="another.gif">

"ALIGN=MIDDLE" aligns the text with the center of the image. If you want to center an image in a page, you can do it with:

   <CENTER><IMG SRC="somegfx.gif"></CENTER>

A text string can be associated with the image using the ALT keyword:

   <IMG SRC="somegfx.gif" ALT="Some Funny Graphics">

This text string will be displayed if a web browser has graphics loading turned off or otherwise can't get to the bitmap. Modern browsers also pop up a little window containing the text if the mouse cursor is left over the graphics for a moment.

The image tag can be embedded in a line of text:

   I <IMG SRC="heart.gif"> My German Shepherd

BACK_TO_TOP

[3] HYPERLINKS

* So far all the tags discussed have focused on document cosmetics. However, anyone who uses the web knows that the real usefulness of a web page is obtained through "hyperlinking", in which pointers are set up on a web page to specify other web pages that the surfer can load with a mouse click.

HTML defines a hyperlink with the "anchor" tag:

  <A HREF="http://www.acme.com/tutor.html">Tutorial</A>

The URL can also be to an FTP site or to a file:

   ftp://www.joker.org/ 
   file:///users/joker/index.html

For the anchor tag above, the web browser will display the highlighted field:

   *Tutorial*

Click on this with a mouse and the web browser will access the web page with the URL:

   http://www.acme.com/tutor.html

The hyperlinks of course do not have to be to other websites. They often refer to files residing on the same machine as the current web page. This allows the author to weave a set of files into a hyperlinked website. For example:

   <A HREF="biodata.html">Biography</A>

This hyperlink points to an HTML file named "biodata.html" that is in the "current directory", that is, the directory on the website where the browser found the current page to display.

BACK_TO_TOP

[4] FILE SYSTEMS

* This leads to a discussion of relevant features of a computer file system. Personal computers all use a "hierarchical" file system (as do, with minor differences, UN*X workstations) and it is important to understand this concept to build a website.

Early personal computers featured a "flat" file system, meaning that each disk contained a single directory and a set of files. However, a modern PC's file system may contain a directory, which lists both files -- and lower-level "subdirectories" -- which may list files of their own, as well as their own subdirectories -- which may list files of their own, and so on (until the disk runs out of space). This scheme defines an upside-down "tree" (a "hierarchy") of directories.

For example, consider the following hierarchical file system:

This is a very simple example, but it illustrates the basic ideas of a hierarchical file system.

At the "top" of this upside-down "tree" is a directory named "C:\", or (ignoring the drive specifier), simply "\". This is always the name of the topmost directory on a PC disk; since it is the directory from which the rest of the directory "tree" grows, it is called the "root directory", or simply "root".

The root directory in this example contains three files, as well as three subdirectories: "Utils", "Web", and "Tmp".

The three subdirectories in this example store different sets of files: "Utils" stores various utility programs, "Web" stores HTML files, and "Tmp" stores temporary files. Furthermore, the "Web" subdirectory has two subdirectories of its own: "Priv" and "Pub", to store HTML files of personal and public interest respectively.

Please remember, this is only an arbitrary example. Any convenient organization can be defined; subdirectories may have any name; files can generally be stored where ever they are convenient. A hierarchical file system allows a user to create a neat organization for the files on his or her machine.

* Having a hierarchical file system implies a need to be able to describe the location of files in it, which leads to the idea of a "pathname".

Suppose a file (like, say, filelist.txt) is stored in the root directory of the "C:" drive; then it can be located by prefixing the file name with the drive ID and root directory name:

   C:\filelist.txt

One of root's subdirectories can be referred to in the same way; for example, the pathname of the "Utils" directory is:

   C:\Utils

A user can specify the "paint.exe" file in that subdirectory by "splicing" the file name to the subdirectory name, with a "\" as follows:

   C:\Utils

"C:\Utils\paint.exe" is the "full pathname" of the file. A full pathname will always locate the file. Such full pathnames can be used to describe all the files and directories in the system in the same way:

   C:\Web
   C:\Web\Priv
   C:\Web\Priv\bdays.html

-- and so on.

* Writing out full pathnames for files can be tiresome, but, fortunately, it is also possible to refer to a file by its "relative pathname". The full pathname describes a file from any directory on the PC. The relative pathname describes a file "relative" to the directory the user is "in" at the time (the "current" directory).

As the simplest example, if the current directory is on the "C:" drive, there is no need to specify "C:" in the path, since it's assumed:

   filelist.txt

Extending this idea, to refer to a file in a subdirectory of the current directory, all the user has to do is specify the file name spliced to the subdirectory name. For example, if "Web" is the current directory, then the "bdays.html" file in the "Priv" subdirectory can be identified as:

   Priv\bdays.html

This command can only work if "Web" is the current directory. If it's not, the file won't be found.

Relative pathnames make life easier, if at the cost of occasional confusion. Just to add to this confusion, let's add another wrinkle: the current directory can be referred to simply as ".", and (more important), the "parent" directory of the current directory can be referred to as "..".

The "parent directory" is the opposite of a "subdirectory". That is, since "Utils" is a subdirectory of the root directory, then the root directory is the parent directory of the "Utils" directory. For example, if "Utils" is the current directory, then "filelist.txt" can be given by:

   ..\filelist.txt

* The preceding examples assume a PC file system, but a UN*X workstation uses a similar scheme -- with two primary differences. First, there are no drive identifiers (like "C:" or "A:"); under UN*X, different disk drives are "mounted" to build up one big hierarchical file system.

Second, the "path separator" character on UN*X is "/", while it is "\" on a PC:

   C:\Web\Priv\bdays.html   # PC format
   /Web/Priv/bdays.html     # UN*X format

* The reason understanding the hierarchical file system is important is because if a computer is used as a web server, the file system that is available to a web surfer accessing the server is often a subset of the file system available to a person with direct access to the server. Given the directory tree above, for example, all the surfer will have access to is the subtree starting with "Web". This will appear as the "root" directory to the surfer.

So, if a website is designed with hyperlinks using absolute pathnames, it will work if the user has complete access privileges to the server, and completely break if any outside surfer tries to access the pages. This means that the web pages on the site should in general be linked using relative pathnames. This also makes it easier to modify the website.

For example, a link to a product file from one page might have the form:

   <A HREF="Priv/bdays.html">Birthdays</A>

This document in turn will usually link back to the page that called it with the hyperlink:

   <A HREF="../index.html">Return To Main Page</A>

One related comment before continuing: it is also important when building a website that the surfer be granted permissions to access the pages. The author should work with the server system administrator to ensure free access to the site.

* Note the use of the filename "index.html". This is a special default file name that a web browser will automatically load (in HTTP protocol at least) if no specific file is specified. This is the file that is accessed when surfing to a website where no target file is specified, just the server:

   http://www.acme.com/

BACK_TO_TOP

[5] MARKERS / CLICKABLE BITMAPS

* The hyperlinks shown so far allow linking to other websites or to other files on the same website, but it is also possible to "mark" places within a web page so that they can be linked to. Such "markers" are defined by another variation on the "anchor" tag:

   <H2><A NAME="m3">SECTION 3</A></H2>

Notice that this anchor tag is embedded inside second-level header tags. As mentioned, anchors can be embedded, and it's common with markers, since headers are often hyperlink targets. This allows a hyperlink to be made to this specific location, as follows:

   <A HREF="myfile.html#m3">Section 3 -- An Overview</A>

The link can be from within the same document:

   <A HREF="#m3">Go To Section 3</A>

The marker name "#m3" is arbitrary. The author could use "#markIII" or "#markC" or "#foobar" or whatever, for that matter.

* While all the hyperlinks shown so far use text to define the label used, bitmap images can be used as well:

   <A HREF="../index.html"><IMG SRC="prev.gif"></A>

This loads a little arrow that says "Prev" in the web page. Clicking on the arrow causes the browser to jump back to the index page in the parent directory. It is common practice to use "Prev.gif" and "Next.gif" bitmaps to navigate through a website.

A somewhat more advanced technique allows you to use selected regions of an image to link to files. Creating such a "clickable image" is done with the "MAP" and "AREA" tags.

For an example, let's say we have a bitmap named "testmap.gif" that has four button-like square regions on it, as follows:

This bitmap is 175 pixels (picture dots) on a side. Each square region is 50 pixels wide and high, and the margins and spacing between the squares are all 25 pixels. We want to link the region defined by "B11" to a web page named "file11.html", and similarly link "B12" to "file12.html", "B21" to "file21.html", and "B22" to "file22.html". Clicking on one of the squares should bring up the appropriate file, but to prevent confusion clicking on the margins or spacing between the squares should do nothing.

The first thing that needs to be done to work with MAP and AREA is define the coordinates of the squares within the bitmap. The bitmap is regarded as an X,Y grid, with the 0,0 coordinate at the top left corner. Dimensions of elements within the bitmap are defined as offsets from that corner as follows:

   Xmin,Ymin,Xmax,Ymax

This means that the coordinates of the four squares are:

   B11:  25,25,75,75   
   B12:  100,25,150,75   
   B21:  25,100,75,150
   B22:  100,100,150,150

Now we can set up the tags, beginning with a simple IMAGE statement:

   <IMG BORDER=0 SRC="testmap.gif" USEMAP="#maplist">

This looks like a conventional IMAGE statement using a bitmap to set up a link, except that instead of a target file it specifies a "map" named "maplist". The name "maplist" is arbitrary, by the way, you could call it "sqcoords" or "xyzzy" or whatever you want.

The map is set up by the MAP and AREA tags as follows:

   <MAP NAME="maplist")
   <AREA SHAPE="rect" COORDS="25,25,75,75"     HREF="file11.html">
   <AREA SHAPE="rect" COORDS="100,25,150,75"   HREF="file12.html">
   <AREA SHAPE="rect" COORDS="25,100,75,150"   HREF="file21.html">
   <AREA SHAPE="rect" COORDS="100,100,150,150" HREF="file22.html">
   </MAP>

The MAP code can be placed anywhere in the HTML file relative to the IMAGE code that uses it.

In this example, we're setting up a list of rectangles to be used as clickable elements, but the AREA tag can also be used to set up circles, with the coordinates specified as "X,Y,radius" as follows:

   <AREA SHAPE="circle" COORDS="25,25,15" HREF="circtest.html">

Similarly, an irregular polygon can be set up by defining a list of X,Y pairs:

   <AREA SHAPE="poly" COORDS="77,44,119,44,98,3,77,44" HREF="polytest.html">

BACK_TO_TOP

[6] BELLS & WHISTLES / META TAGS

* HTML provides a large number of bells and whistles for the website author. A few handy ones are discussed here.

The previous discussion of hyperlinks focused on linking from one HTML file to another, but the hyperlinks can point to any type of file, and the web browser will use the file extension to determine what should be done with the file. Typical file types include:

   .html:    html document
   .txt:     plain text
   .gif:     GIF bitmap image
   .jpg:     JPEG bitmap image 
   .jpeg:    JPEG bitmap image

   .wav:     Windows audio file
   .ps:      PostScript file 
   .mov:     QuickTime movie 
   .mpeg:    MPEG movie 
   .mpg:     MPEG movie
   .mp3:     MP3 audio file

A text file is displayed "as is", with no formatting, and GIF and JPEG graphics files are displayed by web browsers automatically. However, in general to handle audio or video files, the web browser must have been configured by its user to access external "viewers".

One nice thing to add to a web page is a background bitmap using the tag:

   <BODY BACKGROUND="hp.gif">

Variations on this tag allow setting colors of various elements of the display:

   <BODY BGCOLOR="#ff0000">       Background color.
   <BODY TEXT="#00ff00">          Text color.
   <BODY LINK="#0000ff">          Link color.
   <BODY VLINK="#ffffff">         Visited link color.

The colors are expressed as red-green-blue values in hexadecimal (base-16, in which values are counted 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f). For example:

   ff0000:     red
   00ff00:     green
   0000ff:     blue
   000000:     black
   ffffff:     white

As a final detail, there is a specialized form of URL that causes a web browser to bring up an email utility, to allow a surfer to send email to a specific address.

   <A HREF="mailto:coyote@acme.com">Wyle Coyote</A>

This is a handy way to get feedback through a web page, and is a useful thing to add to the end of a page.

* While most HTML tags deal with the formatting and display of information on a web page, there is also a tag, the META tag, that is used to provide information about the web page, as well as provide control instructions to a web browser.

The standard META tag has the syntax:

   <META name="META_tag_name" content="META_tag_value">

There is a predefined set of "META tag names" that can be assigned "META tag values". One of the simplest META tags is "author", which is used to define the author of the web page:

   <META name="author" content="gvgoebel">

Similarly, the "description" META tag is used to provide a description of what the web page contains:

   <META name="description" content="This page provides a survey of
   superconductive physics, materials, and technology>

The "keywords" META tag complements the "description" META tag to provide keywords for Web search engines:

   <META name="keywords" content="Superconductivity, BCS theory, YBCO">

The "robots" tag provides instructions to tell web-indexing robots how to handle the web page. For example:

   <META name="robots" content="INDEX,FOLLOW">

-- tells a robot that it should index the page and follow the links on it. Content labels also include NOINDEX and NOFOLLOW.

* The META options that control a web browser have an alternate format:

   <META http-equiv="META_tag_name" content="META_tag_value">

There are a number of these META tags, but most are rather obscure. One of the more important is the "refresh" tag, which sends the reader to another website after a specified delay. For example:

   <META http-equiv="refresh" content="10;http://www.newsite.com/">

This tag tells the user's browser to "wait ten seconds and then jump to www.newsite.com". Another useful META tag is "expires", which is used to tell the user's browser when the page is out of date and should be reloaded. To tell the browser that the page should be reloaded on every new access, the date should be specified as "0":

   <META http-equiv="expires" content="0">

To specify a particular expiration date, it should be spelled out in Greenwich Mean Time:

   <META http-equiv="expires" content="Tue Feb 13 12:00:00 GMT 2001">

BACK_TO_TOP

[7] PUTTING IT TOGETHER

* The tools discussed so far can be used to build a reasonable web page. These tools are very simple. Once they are understood, the real problem becomes one of organizing the effort.

Planning a website involves three considerations:

Content.
Cosmetics.
Organization.

* The "Content" issue is common sense: The topic, scope, and target audience of the web page needs to be clearly defined and understood. One secondary issue is maintenance: it is very common for people to put together websites and then forget about them. Such dead sites are known as "cobwebsites". This is a waste of time for all concerned, so a website should not be built unless there is the intent to keep it up to date.

* The "Cosmetics" issue is a little more troublesome. Novice website builders will tend to clutter their pages with graphics and bells and whistles. In reality, there is a tradeoff between complexity and utility. Graphics-intensive websites, for example, may look pretty, but they will take a long time to load into a user's browser and may simply irritate users.

Sometimes "features" can be counterproductive. Some people, for example, find blinking text extremely obnoxious, and many web authors set up colors or backgrounds that make web pages painfully hard to read. Pop-up windows are fine for warnings but are an intrusive irritant. Another irritant is a website that brings up a separate browser for each new page accessed. As a surfer can easily bring up a second window using the alternate mouse button if it's desired, there's usually no good reason to do it for the surfer.

As a rule: don't get too cute. At best people ignore most of the bells and whistles after the third time they see them, at worst they get increasingly annoyed with them.

As a related issue, it is important not to make too many assumptions about how a web page will be displayed. Different web browsers will handle HTML according to their own settings, and a web page that assumes a specific browser configuration will give bad results on another browser. It is useful to access your own pages from somebody else's machine to see how they look and work.

* The "Organization" issue is the hardest one to deal with. The information to be presented by the web page needs to be structured in a way that makes it clear and easy to access. This is entirely a matter of writing ability and style, but some hints can be provided:

Divide the materials into well-organized separate modules, and then logically divide each module into subsections. It's usually not necessary (and can be quite confusing to the surfer) to divide materials into lower granularity, and sites that provide lots of small files can be irritating.
Each module corresponds to a separate web page in the website, and the subsections correspond to different headings in the same web page. Headings for each page and subsection should be clear and self-explanatory.
Once the materials are organized, the hyperlinking scheme should flow out of it. A typical organization might have an index page with links to a number of subordinate pages, with each of the subordinate pages linking back to the index page. Relative hyperlinks should be used whenever possible.
It is very important to have a clean and simple hyperlinking scheme, organized hierarchically or in some other structured fashion. This makes hyperlinking easier to test, and also prevents all the hyperlinks from being broken if a minor change is made in the website. This is is very easy to do if the hyperlinks have been set up in a haphazard fashion.
Given a rational hyperlinking scheme, it is also useful to present the hyperlink labels to the surfer in a way that they can be clearly understood. Sets of hyperlink labels should be organized logically in groups and clearly described. Scattering hyperlinks all over a web page makes them hard to find and confuses the surfer.
For a website with a large number of files, it is usually appropriate to store the files in a hierarchical file system that matches the organization of the web pages. Each directory contains the relevant files for each page, including bitmap files.

* Now for some specific details. As far as hyperlinking schemes go, a strict hierarchy is one of the simplest approaches:

In this case, each page links to its children and to its parent. There is no jumping over levels in the hierarchy. This means that rearranging the pages or otherwise modifying the site will break the minimum number of hyperlinks. Adding hyperlinks back to the index page is workable, though it increases the complexity somewhat.

Related chapters of a large document that are organized as web pages can be chained to each other in a ring running through the table-of-contents page:

These are only representative organizations. Any rational scheme may work just as well, but they do illustrate the need to keep hyperlinks well organized.

If these schemes are hyperlinked using graphics labels in the form of arrows, it can be a bit confusing for a novice to figure out the scheme. One useful trick is to use the "ALT" feature in the bitmap tag code to give a string of text explaining what the function of the bitmap is: "BACK", "GO TO INDEX", "NEXT CHAPTER", and "PREVIOUS CHAPTER". The user can then get a hint about what the bitmap does by resting the mouse over it for a moment.

* Once the hyperlink organization has been selected, then there is the question of organizing the hyperlink labels. They should be grouped in fairly small groups to make them easy to inspect. It is useful to organize them as lists:

   <UL>
      <LI><A HREF="Priv/bdays.html">Birthdays</A>
      <LI><A HREF="Priv/address.html">Addresses</A>
      <LI><A HREF="Priv/etc.html">Other Data</A>
   </UL>

This would be displayed as:

  o *Birthdays*
  o *Addresses*
  o *Other Data*

A popular scheme for displaying hyperlink labels involves placing them on the same line, separated by the "|" character. This is easy to do:

   Fun bitmaps:
   <A HREF="gfx/moon.gif">MoonScene</A> |
   <A HREF="gfx/mars.gif">MarScape</A> |
   <A HREF="gfx/earth.gif">EarthInSpace</A>

Since the web browser fills text, this is displayed as:

   Fun bitmaps:  *MoonScene* | *MarsScape* | *EarthInSpace*

Note that the labels are defined as one word, since "MoonScene" seems a little less confusing than "Moon Scene" or "Moon_Scene" when clustered together on the same line with other labels. Sets of these clustered labels can also be provided in a list.

* Simple tables can be easily provided for a website using preformatted text blocks. One trick is that the underscore ("_") character is better for building lines than the dash ("-") character, since underscores tend to be displayed as continuous lines:

   <PRE>
                                     _____________________________
    ------------------------
     this-table-uses-dashes           this_table_uses_underscores
    ------------------------         _____________________________
     value 1:          3.11
     value 2:          2.05           value 1:              2.37
     value 3:          3.06           value 2:              3.44
     value 4:          1.05           value 3:              1.03
    ------------------------          value 4:              0.21
                                     _____________________________
    </PRE>

* Graphics files in fact are something of a two-edged sword in websites: they can enhance its appearance but can also provide clutter and take a painful amount of time to load on slow connections.

Modest-size bitmaps are generally preferable in this respect to large ones, and simple GIF images involving a few solid colors and simple patterns are much faster to load than, say, JPEG photographic images.

In the case of web archives of graphics images, one clean little trick is to provide a set of shrunk-down "thumbnail" images of the archive's contents as hyperlinks to the full images.

One useful graphic is a "banner". This is a bitmap with title information for the website or a specific document that normally measures 468 x 60 pixels. There are "banner exchanges" available on the web that allow people to submit banners so they will circulate to other users along with a link to the site advertised by the banner.

Some surfers will turn off graphics to get greater speed in accessing websites. Please make sure that the site works even when graphics are not loaded.

* One peculiarity of web pages is that they are not read like books. They are read like scrolls. While it is useful in both books and web pages to have high-priority information first in the text and lower-priority information last, one eccentricity of web pages is that if you have a list of materials in chronological order, it may be better to put the most recent materials first, rather than last, since they are most often accessed.

* A few final comments on website design.

A product is not finished until it is tested. The author should at least inspect the website for cosmetics, and check all the hyperlinks. It is preferable that the author have a test subject who is not familiar with the website to validate that it can be understood and accessed easily. If your website is being built inside an organizational network that may have barriers to the outside world, the website should be accessed from the outside world to ensure that permissions are valid.
It is also appropriate to include in a website information allowing the author to be located, and the last time the website was modified. On the other hand, if you hand out your email address you may find it invites spam -- obnoxious Internet advertising -- so you might take that into consideration.

BACK_TO_TOP

[8] ADVANCED FEATURES

* The previous sections in this document have described how to construct elementary web pages, but there are more sophisticated tools available for building fancier web pages. These features will be mentioned briefly, with no discussion of details, as they are beyond the scope of this short document:

Tables: While preformatted text blocks or embedded graphics can be used to build reasonable tables for display in a web browser, there are also HTML commands to allow you to construct formatted tables directly.
Frames: Modern web browers have the capability to display data in sets of independent "tiled" windows known as "frames".
CGI Scripts & Forms: HTML also provides the ability to allow a surfer to interact with the web page -- to make database queries, for example, or fill out a form. This is an advanced capability that requires assistance from a system administrator.
Java: Java is a programming language that can be used to write programs that run in web browsers. Java programs can be used to build animations or simple applications.
Java programs have to be compiled and then linked to a web page. There is also a simpler "JavaScript", something like a stripped-down Java where the commands can be written as part of the page, and similar but even simpler "VBScript", developed by Microsoft and based on their Visual BASIC language.

There are many books available on advanced HTML for those who wish to pursue these features. Various tools are now available to make building web pages automatic. However, it still remains useful to understand the basics of HTML, since a web page can be a complicated "machine" that will function much better if the builder knows how it works.

BACK_TO_TOP

[9] COMMENTS, SOURCES, & REVISION HISTORY

* I originally wrote these materials during the 1990s, but I eliminated them from my website in 2001 as I didn't feel they were getting any attention. I later realized they had some value and it made no sense just to keep them in my archives gathering dust, so I restored them to the site in 2003.

As I had lost the revision history I restored it as "v1.0.0", meaning it was a first-release document. This is not actually true, but as any earlier version would have had a two-part revcode (for example "v1.2") instead of a three-part revcode it should allow any earlier version to be detected, on the unlikely chance there's one floating around on the Internet.

As I wrote this stuff somewhat informally as notes for my own use, I never kept a source list on it. It doesn't really matter in this case, since if the examples in this document work as advertized there's no reason to question their validity.

* Revision history:

   v1.0.0 / 01 jun 03 / gvg

BACK_TO_TOP

Index | Home | SiteMap | Updates | Email Comments

counter last reset 01 sep 2003