procmail
[Top] [All Lists]

Re: Weird characters in email

2003-03-01 10:56:36
Jeff Orrok writes:
"Tyler F. Creelan" <creelan(_at_)engr(_dot_)orst(_dot_)edu> writes:

Another example I've seen is:

"It^Os completely free and easy to use... There^Os also a classifieds
section for roommate requests, textbook sales..."

You're talking about 8-bit character sets. The issue comes up that a lot of
languages besides english have characters or accents which can't be
rendered in 7-bit US ASCII. Since people don't need parity any more, the
8th bit has been taken over for extended or multinational character sets.
Some examples include ISO-8859-1 (a.k.a. ISO Latin 1?), the DEC MCS,
MacRoman, the US Windows character set, various other flavors of ISO-8859
for other parts of the planet.

The latter renders as something like '^O' in pine, but '~R' if you
examine it with vi -b.

I vaguely recall that the ~R has the msb set -- you can verify this with od -c

Yes, I would expect so.

So, technically, it is still text/plain, at least in M$ land....

Apple and DEC and most of Europe as well. Basically anybody who isn't
limiting themselves to US ASCII and isn't using Unicode.. a whole 'nother
can o' worms. (It's just a darned good thing people don't send EBCDIC!)

charset is an optional attribute of Content-Type and defaults to us-ascii
for text/plain. People using other character sets SHOULD be specifying the
charset attribute, but I wouldn't count on it, or on your mail reader
honoring it, or even on having the appropriate character set mapping
present on your machine.

I also note that there is an optional negotiation which allows transport of
8-bit content via SMTP without encoding it.

Jeff

--

Fred Morris
m3047(_at_)inwa(_dot_)net



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>