fixing broken URLs in auto-converted FAQs

David Alex Lamb (dalamb@qucis.queensu.ca)
Tue, 3 Oct 1995 14:08:03 -0400

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Bernhard Muenzer: "Re: "From" line"
Previous message: Nancy McGough: "Re: How do I stop the automated conversion of a posted FAQ to HTML? (fwd)"

The following may be useful to people who maintain their own Web versions of
files mentioned in FAQs, and have a friendly system administrator to install
CGI scripts for you. For the rest it's probably useless technobabble.

Some of the auto-converters from text to HTML for FAQs don't parse the
<URL:...> convention, which I had thought was a standard and which I also
thought was required by yet other converters. In particular, the Ohio State
archive was translating
<URL:http://blah/foo>
into a link that included the trailing >, ie
<a href="http://blah/foo>">...</a>

I worked around this for my own site (which is the only one I usually
reference in this way) by having a student create a PERL CGI script that gets
invoked on a "404 not found" error message; if there is a trailing > it
generates a link without the >, and otherwise just gives the standard "not
found" error. In HTTPD 1.4, which we're running, you set ErrorDocument to
/cgi-bin/theScript; there ought to be equivalents in other servers.

Before I did this, the operations staff created symbolic links for the most
commonly occurring references, e.g. "college.html>" as a symbolic link to
"college.html". My hack is slightly more general, but still not a complete
fix, since it helps only my own site. The best solution is smarter
converters, of course, but that's not within my control.

Here it is:
#!/usr/local/bin/perl
$error = $ENV{'QUERY_STRING'};
$redirect_request = $ENV{'REDIRECT_REQUEST'};
($redirect_method,$request_url,$redirect_protocol) = split(' ',$redirect_request);
$redirect_status = $ENV{'REDIRECT_STATUS'};

print "Content-type: text/html\n\n";
print "<TITLE>".$redirect_status."</TITLE>";
print "<H1>".$redirect_status."</H1>";
print "The requested URL ".$request_url." was not found on this server.<p>";
print "<p>";
if (($possible_url, $offending_character) = ($request_url =~ /(.*)([&>])$/))
{
print "However, I noticed that the URL you requested ended in ";
print "the character ".$offending_character.", which can sometimes ";
print "happen if the link you followed was generated by certain scripts ";
print "that attempt to find URLs in ASCII text. Therefore, ";
print "you might want to try <a href=\"".$possible_url."\">";
print $possible_url."</a> instead.<p>";
}

Next message: Bernhard Muenzer: "Re: "From" line"
Previous message: Nancy McGough: "Re: How do I stop the automated conversion of a posted FAQ to HTML? (fwd)"

[ Usenet Hypertext FAQ Archive | Search Mail Archive | Authors | Usenet ]
[ 1993 | 1994 | 1995 | 1996 | 1997 ]