![]()
I thought I had posted this, but here it is again.
Sorry for the extra noise, but I guess it's worth
making sure people have access to everything relevant.
Oscar
--*** www:25 From: Thomas A. Fine <fine@cis.ohio-state.edu> Originator: fine@cis.ohio-state.edu To: Oscar Nierstrasz <oscar@cui.unige.ch> Send-Date: Tue, 9 Nov 1993 11:34:31 UTC-0500 Delivery-Date: Wed, 10 Nov 1993 10:09:54 UTC+0100 Subject: Re: FAQ format document?
RFC-822-HEADERS: X-Mailer: Perl Mail System v2.0
> >Hi! I like very much your hypertext FAQ facility. I have been >converting some FAQs (OO, software eng & WWW) to HTML by writing >a front-end to ms2html. In the long run I think your approach is >better, though. > >Can you give me a pointer to a plain text version of your FAQ format FAQ? >I can only find the hypertext version. I would like to forward it >to other FAQ writers. Particularly the comp.object FAQ is not in the >format you require, and it would be nice to convert it.
There are two possible formats that I can deal with. The first is the hypertext format that I originally designed for this project, and the second is the Digest format that is recommended on the faq-maintainers mailing list.
I've included descriptions of both, although I don't think they are totally up to date. I will continue to support both formats indefinitely, but you should keep in mind that the digest format is the one which is most often used, because of benefits not releated to WWW.
Also, check out the technical notes at the top of the Usenet FAQs list.
tom
--------------------------------------------------------------------------- The first is a sample posting in my format: --------------------------------------------------------------------------- Path: cis.ohio-state.edu From: fine@cis.ohio-state.edu (Thomas A Fine) Newsgroups: news.misc,comp.infosystems,comp.answers,news.answers Subject: Generic FAQ format for World Wide Web Followup-To: comp.infosystems Date: 11 Feb 1993 15:08:54 -0500 Expires: 11 Mar 1993 00:00:00 GMT Summary: Information on the conversion of FAQs to WWW hypertext documents Organization: The Ohio State University Dept. of Computer and Info. Science Lines: 240 Message-ID: <asdfINN123@soccer.cis.ohio-state.edu> NNTP-Posting-Host: soccer.cis.ohio-state.edu Content-Type: text/x-usenet-FAQ; version=1.0; title="Hypertext FAQs"
Archive-name: faq-format/www Last-modified: 1993/02/11
Statement of Intent -------------------
FAQs are a wonderful resource, but hard to work through. This project is an attempt to unite the volume of information found in news.answers and other newsgroups with The World Wide Web, a system for networked information retrieval.
World Wide Web uses hypertext documents and a network transport protocol to build a huge web of information all over the world. Accessing documents is as easy as clicking on a mouse. (there are tty-based interfaces available too). WWW (as we like to call it, for obvious reasons) also knows how to talk to other services including WAIS and Gopher. There are plans to extend the document type to be a MIME document, of which WWW's hypertext (called HTML, for HyperText Markup Language) will be one part.
Since we don't expect everyone to learn HTML (although it is fairly straight forward), we have designed a format that can be used for news.answers (et. al) documents and FAQs which will allow us to automatically convert them to hypertext. The format has been designed to allow FAQ maintainers to providing conforming news articles with minimal changes. Note that while this is currently the only format, it is possible to support multiple formats.
The format is described in following sections. Note that this article itself conforms to the format, and a formatted version is available thru the Web (See "Getting WWW Software").
Getting WWW Software --------------------
To see what this document looks like after it's been formatted, grab yourself some software and give it a try. The software is available via anon ftp from various places. There are several different packages available:
ftp.ncsa.uiuc.edu in /Web/xmosaic xmosaic-0.8.tar.Z X11 browser - fairly new, very nice. (binary available in the dir binaries-0.7)
info.cern.ch in /pub/www/src tkWWW-0.5.tar.Z X11 browser - Tcl/Tk implementation viola920730.tar.Z X11 browser - a bit out of date midaswww-1.0.tar.Z X11 browser - new version expected soon WWWLineMode_1.3b.tar.Z dumb terminal browser WWWNextStep_0.15.tar.Z A NextStep browser and editor www_and_frame-0.2.tar.Z A package for editing HTML with FrameMaker [my tty-based browser and editor will hopefully be included soon]
To find the FAQ stuff in the Web, you will need the following Universal Resource Locator, which can be typed into your browser in an application specific way (the XMosaic author promised to include a built-in link to the information):
"http://www.cis.ohio-state.edu:80/hypertext/faq/usenet/FAQ-List.html"
The hypertext documents produced from conforming FAQs uses few of the features of html, and so will look rather plain looking. If you would like more information, start by getting the software and rummaging around the web. You can also get on the www-talk mailing list by sending to www-talk-request@nxoc01.cern.ch. Lastly, you could bug me with mail if you were really desparate. If after seeing what this system can do, you decide you want to provide your documentation directly as hypertext, contact me (fine@cis.ohio-state.edu) and we'll work out the details. Send questions about this format to the same.
The Header Format ----------------- [Note that when "article" is used, a single article is being referred to. When "posting" is used, the entire set of articles are being referred to.]
In order to be recognized as a conforming article, it must use MIME headers as follows:
For single-article postings, the header must include:
Content-type: text/x-usenet-FAQ
In addition, two fields can be added to this line:
version=1.0
This indicates the version number to process the file with. If absent, version 1.0 will be assumed. The other field:
title="The title of the article"
This will be used in various places in the conversion to hypertext. If not present, the subject line (in its entirity) of the first article of the posting will be used as the title of that posting. The posting title must be unique.
Note that when attributes are used, semicolons should also be used after the Content-type, and after each attribute except for the last:
Content-type: text/x-usenet-FAQ; version=1.0; title="Blarg"
For multiple-article postings, the Content-type information as described above should be the first thing found in the BODY of the first article of the posting (it can be included in the secondary header with Archive-name and Version lines found in many FAQs). In addition, each article header must contain the MIME multipart information:
Content-type: message/partial; number=1; total=3; id="totally-unique-id-string"
The "total" attribute is only required on the final part of the document. The id will be the same for each article in the posting, but is supposed to be guarenteed unique among postings. A format similar to the typical news message id is recommended, e.g. something including the poster and the posting host along with an id unique to that host (the time).
The Body Format --------------- The content is fairly free-form. It can contain any of the following "sections"
Documents Ignored text Questions/Answers
The Ignored text is stripped out first, then the articles are appended, together. The converter then expects to see a series of "Documents" and or "Questions/Answers" sections. These are all described below if you are in text, or are links at the top level, if you are in hypertext.
These will be formatted into a top level hypertext document with links to all the other documents. The software will attempt to handle simple subsets of this format accordingly; for instance if a posting consists entirely of a single questions/answers section, the conversion will show the list of questions as the top level of the hypertext (this hasn't been implemented yet).
Each of the Documents or Questions/Answers sections must be started with a blank line, a left-justified title line, and a left-justified line of dashes to underscore the title. The only exception is a posting which is either a single document, or a single set of questions/answers, in which case no such title is required anywhere. Also, the first section does not require such a title; if it is left out, the title "Introduction" will be used.
Documents --------- A document section is just any section of text, separated by the section title described previously, and not matching the Questions/Answers form. This means you must make sure no line in a document section starts with a number followed by a right parenthesis.
This section on "Documents" you are reading now will be an entire hypertext document after conversion (this may not be the best choice for the hypertext layout, but makes a good example.)
Ignored Text ------------ Some text can be ignored in the posting. Typically in news articles, you may need to include some redundant text in every article, that won't be needed in the hypertext. Also some information, like how to unpack the articles might be unneeded. Lists of questions are another example, since the conversion software builds this list from the questions themselves.
To mark text as uneeded, it should be surrounded by lines consisting only of "--".
An example starts here:
-- This text won't show up in the hypertext.-- That was the example. Note that if you are looking at the hypertext, there was nothing between the start and the end, because it was ignored.Important note: Text will stop being ignored at the end of each ARTICLE, even if there is no ending "--". (This can be used to eliminate signatures, since lots of people use the "--" there anyway.)
Questions/Answers ----------------- Any section which contains a left justified number followed by a right parenthesis and then white space will be treated as a Questions/Answers section. The number can contain several decimal points, so "1.4.11) " is an acceptable starting string for a question.
The requirements for this section can be summed up as: * There must be a blank line before and after each question. * All answer text must be indented from the left margin. * All questions must be have no indentation on the first line. * All questions start with a number, a right parenthesis, and some whitespace.
There can be a section title for some portion of the questions. It must have no indentation, and must be preceded by blank line, and followed by a blank line and then a question. It cannot start with a number!
Creating Additional Links ------------------------- Anywhere where the string ``(See "document title")'' occurs, a link will be created to that document if it exists. This link is just an example: (See "Questions/Answers").
If you are going to do this, make sure you refer to a document title that is unique. All non-unique references will be ignored.
Questions and Answers ---------------------
Section 1. The Documents
1.1) What is the documents section for?
For introductions to newsgroups, introductions to FAQs, and other postings of interest for a newsgroup that don't fall into the question and answer scheme.
1.2) What if I have only a single document (and no questions)?
Then make sure there are zero or one document titles (a title line followed by a line of dashes), and nothing that looks like the start of a Question/Answer section.
Section 2. The Question/Answer portion of the FAQ
2.1) What if my question is too long to fit on one line?
Just make sure you don't put a blank line in the question. The software will deal correctly.
2.2) How do you distinguish section titles from questions, if both start at the beginning of the line?
Sections titles can not start with numbers. Questions have to start with a number, followed by a right parenthesis and then whitespace.
2.3) Where, exactly, should I put blank lines
A blank line has to occur before and after each question. Blank lines must also occur before each section title. All other blank lines will be preserved.
Section 3. Other questions
3.1) Can I create additional links?
Yeah, just put in text that looks like this:
(See "Creating Additional Links")
It must refer to a Document title or Q/A title somewhere in the posting.
3.2) Can I do tree structuring?
The format is converted into a three level tree, but you can't impose any other structure on it, unless you want to provide your documentation directly as hypertext.
3.3) What if I don't like your format?
Come up with your own. We may even help you write the software to convert it. Since the format requires a version number, its easy for us to support multiple versions at the same time.
If you do decide to create your own format, it would be best if it would be general enough for more than your own postings, as we think it would be a little cumbersome having a separate piece of software for every different posting.
3.4) Since this uses MIME headers, will there be an application to read such documents with a MIME news/mail reader?
Eventually, but not now.
3.5) Why didn't you use SeText format?
Because we wanted to get this done quickly, and SeText isn't a fully realized standard yet. Also, its not clear that SeText will handle all the functionality we needed. It is possible that future versions will be based on SeText, or at least include some of its features.
BTW SeText is a pseudo-markup language for text documents. All the markups are chosen as items which won't interfere with the reading of the document in an unformatted state. Neat stuff, but not ripe yet.
About this posting ------------------ This is posted monthly to news.answers, comp.answers, comp.infosystems, and news.misc. If changes are made between posts, no differences will be posted as the formatted and unformatted versions available through the web are assumed to be the latest.
Send comments and corrections about this posting to fine@cis.ohio-state.edu. Please send a context diff, or an entire modified version of the file.
-- ------------------------------------------------------------------------------ | Thomas A. Fine | fine@cis.ohio-state.edu | 2036 Neil Avenue Mall | | CIS Staff | (614) 292-7325 | Columbus, Ohio 43210 | | The Ohio State University - Department of Computer and Information Science |
--------------------------------------------------------------------------- This is a sample posting in digest format: --------------------------------------------------------------------------- Path: cis.ohio-state.edu From: fine@cis.ohio-state.edu (Thomas A Fine) Newsgroups: news.misc,comp.infosystems,comp.answers,news.answers Subject: Digest format for FAQs in World Wide Web Followup-To: comp.infosystems Date: 11 Feb 1993 15:08:54 -0500 Expires: 11 Mar 1993 00:00:00 GMT Summary: Information on the conversion of FAQs to WWW hypertext documents Organization: The Ohio State University Dept. of Computer and Info. Science Lines: 240 Message-ID: <asdfINN123@soccer.cis.ohio-state.edu> NNTP-Posting-Host: soccer.cis.ohio-state.edu
Archive-name: faq-format/digest Last-modified: 1993/02/11
This is a description of the FAQ digest format that I'm supporting. It is partially derived from RFC 1153. But it is mostly based on what people seem to be doing.
------------------------------
Subject: Overview
The basic idea behind the digest format is that it is a digest of other articles, or messages. Each message is separated with a line containing dashes, then a blank line, and then a message header which must include a Subject line, and may include a few other header lines.
The RFC (rfc 1153 that is) calls for each message being separated by a blank line, a line containing exactly and only 30 dashes, and another blank line. Then the message header, a blank line, and the message body. The RFC specifies several header lins which can be included, but doesn't require any of them specifically, only that there must be a header of some sort. The RFC has several other details which are even more widely ignored.
In practice, the digest format frequently ammounts to an article with a lot of Subject lines. Almost no one uses exacly 30 dashes, or any of the other particulars specified in the RFC. Many people forego the dashed lines alltogether, or add extra lines with dashes, or leave out blank lines, or worse. It was a real challenge coming up with software that will deal with most of it.
The following sections detail my interpretation of the digest format -- what my software expects, and can deal with.
------------------------------
Subject: How is a posting identified as being digest format?
A point system has been developed, and each posting is scored. Subject lines by themselves are worth zero points A Subject line after a line of only dashes is worth one point A Subject line after a blank line is worth one point A Subject line after a line of only dashes and a blank line (in either order) is worth two points
An article needs six points to be targeted for conversion.
If one of the following header lines are found prior to a subject line, they are ignored when determining if the subject line was after a blank or dashed line: From, To, Cc, Date, Message-ID, Keywords, Summary.
------------------------------
Subject: How do you split up the article?
Once I've selected an article, it gets split up based on Subject lines. The content of the Subject line becomes the the title of the following section. The following header lines are moved from the previous section to the next section if they come before the Subject line: From, To, Cc, Date, Message-ID, Keywords, Summary (including continuation lines, I think).
The only time Subject lines are not used to split up the article is if they are adjacent to another subject line. Often people include a list of subjects at the beginning of the article, and without this check, you end up with a lot of empty sub documents.
------------------------------
Subject: What exactly is a subject line?
It is exactly the word "subject" with a capital "S", at the beginning of a line, followed by a colon, and then a space, and then at least one more character. The perl regexp I use is:
/^Subject: .+/
If you think I need to be more general, or more specific, or both, let me know.
------------------------------
Subject: Why is the digest format so popular?
Because there are already many different methods of dealing with this format electronically. rn, nn, and GNUS all have mechanisms for dealing with these formats. In addition I've been told you can coax Emacs outline mode into dealing with them.
[
Usenet Hypertext FAQ Archive |
Search Mail Archive |
Authors |
Usenet
]
[
1993 |
1994 |
1995 |
1996 |
1997
]
![]()
© Copyright The Landfield Group, 1997
All rights reserved