Re: Problems -- 8-bit characters triggering MIME

---------

era eriksson (era@iki.fi)
Mon, 11 Aug 1997 04:24:53 +0300 (EET DST)


On Sun, 10 Aug 1997 17:38:05 -0700 (PDT),
Katie Schwarz <katie@physics.berkeley.edu> wrote:
> I maintain my FAQ in HTML, and use the "Save as text" function of the
> browser to make the plain text version. Does anyone know a way to make it
> not use 8-bit characters in the text?

Depends on the browser, obviously. And on the HTML. If you use high
ASCII in the source HTML, it will probably look weird on other
browsers/platforms anyway. You hopefully use &entities, not raw 8-bit
characters.
I maintain dual copies of my FAQ, and generate the text-only
rendering from a slightly different HTML source file than the one web
visitors see [all versions ultimately come from the same source file,
of course], but this is probably not an option unless you can build
your own tools for this. (The pieces are all there if you're on Unix.)
On Unix, you can at least simply pipe it to tr to get rid of, or
replace, the offending characters automatically. Upgrade that to a
Perl script for more versatile post-processing and, well, you've taken
the first steps on a long journey towards The Perfect FAQ Maintenance
Toolbox For Your Personal Needs (tm).

/* era */

lynx -dump path/to/file.html | perl 's/\xa9/(c)/g' # and so forth

-- 
Defin-i-t-e-ly. Sep-a-r-a-te. Gram-m-a-r.  <http://www.iki.fi/~era/>
 * Enjoy receiving spam? Register at <http://www.iki.fi/~era/spam.html>