Re: Keywords

---------

Denis McKeon (galway@chtm.eece.unm.edu)
Sat, 3 Dec 1994 15:31:10 -0700


Nancy McGough <nancym@ii.com> writes:

>What format should I use for keywords in the Keywords header -
>comma separated, space separated, quotes?
>
>What newsreaders allow you to select articles based on the
>Keywords header?
>
>Thanks for any info,

Good news, bad news - a clear description, but nobody uses it.

Henry Spencer's son-of-1036 draft says:

>2 June 1994 - 1 - expires 15 July 1994
>...
>5.2. From
>
>The From header contains the electronic address, and possi-
>bly the full name, of the article's author:
>
> From-content = address [ space "(" paren-phrase ")" ]
> / [ plain-phrase space ] "<" address ">"
> paren-phrase = 1*( paren-char / space / encoded-word )
> paren-char = <ASCII printable character except ()<>\>
> plain-phrase = plain-word *( space plain-word )
> plain-word = unquoted-word / quoted-word / encoded-word
>...
>
>6.8. Keywords
>
>The Keywords header content is one or more phrases intended
>to describe some aspect of the content of the article:
>
> Keywords-content = plain-phrase *( "," [ space ] plain-phrase )
>
>Keywords, separated by commas, each follow the <plain-
>phrase> syntax defined in section 5.2. Encoded words in
>keywords MUST not contain characters other than letters (of
>either case), digits, and the characters "!", "*", "+", "-",
>"/", "=", and "_".
>
> NOTE: Posters and posting agents are asked to take
> note that keywords are separated by commas, not by
> white space. The following Keywords header con-
> tains only one keyword (a rather unlikely and
> improbable one):
>
> Keywords: Thompson Ritchie Multics Linux
>
> and should probably have been written:
>
> Keywords: Thompson, Ritchie, Multics, Linux
>
> This particular error is unfortunately rather
> widespread.
>
>
> NOTE: Reading agents and archivers preparing
> indexes of articles should bear in mind that user-
> chosen keywords are notoriously poor for indexing
> purposes unless the keywords are picked from a
> predefined set (which they are not in this case).
> Also, some followup agents unwisely propagate the
> Keywords header from the precursor into the fol-
> lowup by default. At least one news-based experi-
> ment has found the contents of Keywords headers to
> be completely valueless for indexing.

For details, see
> ftp://ftp.zoo.toronto.edu/pub/news.txt.Z (also news.ps.Z)

The news.answers guideline post mentions the Summary: header, and
encourages the use of it, but neither the guideline nor introduction
posts mention the Keywords: header.

A number of newsreaders allow selection based on strings in some header
lines, or in any specified header line, or anywhere in the header block.
Given Spencer's description, I doubt that any display the Keywords:
header, or depend solely on it.

Note that the description seems to allow key-phrases, e.g.:

Keywords: FAQ maintain, automatic post, news.answers

but hides that possibilty in the mixed usages of (key-)word and
(plain-)phrase. The description should: s/keyword/keyphrase/g
throughout - too bad we can't do that with the Keywords: header itself.

On the bright side, on-the-fly indexing of text is supposed to be an
easy task for parallel systems - give each processor a few Kb of text to
index and combine the results (he said, waving his hands blithely) - maybe
the successor to NOV will be auto-magic Keywords: lines.

-- 
Denis McKeon   
galway@chtm.eece.unm.edu


[ Usenet Hypertext FAQ Archive | Search Mail Archive | Authors | Usenet ]
[ 1993 | 1994 | 1995 | 1996 | 1997 ]

---------

faq-admin@landfield.com

© Copyright The Landfield Group, 1997
All rights reserved