mhonarc-users

Re: Unrecognized character set

1999-10-04 22:32:37
At 01:57 PM 10/2/99 -0700, Earl Hood wrote:
On October 2, 1999 at 02:17, "SysAdmin, dte.net" wrote:

Sorry if this has come up before folks, but when running MHonArc
I just noticed the following warning:

Warning: Unrecognized character set: windows-1252

The source of the email is Microsoft Outlook Express. Everything
is working great regardless of this warning... Is there anything
I could do to stop this warning, or does it even really matter?

It probably does not matter.  If MHonArc (or more specifically, the
text/plain filter) gets a character set it does not recognize, it just
passes that data through as-is with HTML special characters converted
into entity references.  This technically goes against MIME conformance
criteria (see the MIME Conformance section of the documentation), but
is the best behavior since in most cases, treating the data as the
local charset works.

As for the non-standard "windows-1252" character set, the only
potential gotcha is when characters between the range of 128-159
exist.  This range is not defined by ISO-8859 charsets, and Windows
historically has used the range for Windows-specific characters.
Therefore, non-Window clients may not render the characters, or they
will get rendered in client/OS-specific values.

What is the origin of the prohibition against using these code points?
Isn't it that if you strip the 8th bit they yield control codes?

Anyhow, that is AFAIK the source of the sometimes-voiced allegation that
files using this character set are "not ready for Internet."

For a Perl implementation of a filter to render such texts internet ready,
you might look at

demoronizer

http://language.perl.com/misc/div-www.html

Al


I assume that characters within 160-255 match the iso-8859-1 character
set, but someone else will have to confirm that.  I have not seen a
document listing the specifics of windows-1252.  If anyone has any
pointers, pass them along.

You can shut-up the warnings if you register windows-1252 to
CHARSETCONVERTERS.  Using an existing converter may work, or if you
have information on the windows-1252 charset, a specific converter can
be created.

      --ewh


<Prev in Thread] Current Thread [Next in Thread>