ietf-822
[Top] [All Lists]

Re: internationalization of mail

2004-08-27 01:26:12

Tex Texin <tex(_at_)i18nguy(_dot_)com> writes:

Question one is trying to get at the risk of attempting to transcode every mail
to utf-8 for storage, search and forwarding to recepients. If the mail is
mislabeled then transcoding may corrupt the message. Since not all clients
support utf-8, I may need to transcode the utf-8 to another charset so they can
accept and render it. Hence question two.

My home-grown perl/Tk MUA does that.


1) Are there statistics anywhere for the number of e-mails or message parts
that have their charset mislabeled?
(ie incorrectly identified as some encoding and the content is actually a
different character encoding)

There is a of mail that is mis-labeled - typically claiming 
iso-8859-1 when it is really windows cp1252
You also get things labelled as ASCII but which have high bit chars.
However most of this junk is SPAM (but then so is most email these days).
I have a hacky workround for the cp1252 case.


3) How well supported is utf-8? I know Eudora does not. Pretty much
everything else does, right? Or at least supports some subset of Unicode. I am
a little concerned that if send out utf-8, then perhaps a Thai user, for
example, will find
his client doesnt support  Unicode Thai characters, but would have let him
display a native encoding...

That is possible, but typically UTF-8 aware applications "have" to fall
back to native encoded fonts (good Unicode fonts are still rare) so 
in practice this will work.