ietf-822
[Top] [All Lists]

Re: internationalization of mail

2004-08-26 19:01:59

Hi Kat!
It's been a while...

thanks for the mail, it was helpful.

Correct me if I am wrong, but in the Mozilla environment the mail is treated as
a bucket of bytes that just gets passed around.
So for example if one subject hdr is in iso 2022-jp and another is in
shift-jis, you can change the display to show and interpret one or the other
correctly, but not both at the same time. For purposes of search and filtering,
I was thinking of converting to unicode, and then all subjects would display
correctly, as well as be searchable.

But it may not be worthwhile if many of the mails are incorrectly identified so
transcoding to utf-8 generates muck, or errors out.

tex

Katsuhiko Momoi wrote:

Tex Texin wrote:

Hi,

(snipped)

2) Although I know the character encodings used in many regions, I note that
mail clients sometimes prefer a different encoding than what other 
applications
traditionally use. For example, ISO 2022-jp is used more often for mail in
Japan than it is used precentage wise for other kinds of Japanese software (I
believe).

What is the preferred encoding(s) for mail in each country market? (Although 
we
might prefer to recommend utf-8, I am looking to understand what the market
practice actually is.)


Tex,

It would have been great if we had such a list when we worked on the
Mozilla mail. We tried to restrict the number of outgoing msg encodings
to the ones we can recommend (ISO types mostly unless we heard loud user
complaints).  The list was based on known RFC's and our knowledge of the
market since 1995. The results of this rather tortured attempt can be
found in the Mozilla menu:

Edit > Preferences > Mail & Newsgroups > Composition > Character Encoding

For reading msgs, we did not impose such a restriction and so the user
can choose to correct a msg encoding with all the encodings available
for browsing. For sending msgs, however, the user will have to customize
the above limited list by specifically opening the Customize .. menu --
and then the user can add any encoding he/she chooses.

As far as we knew at the time (a few years back), some RFC's defined
encodings  for for certain languages (e.g. Japanese) but most languages
had no declaration from any authoritative body on this topic. This meant
for us the use of the greatest common denominator encoding(s) for each
language/language group (mostly ISO types) unless users of some
language/language groups complained -- Hebrew and Russian come to mind.

If this situation has changed, I would love to know also.

- Kat

--
Katsuhiko Momoi
e-mail: katmomoi(_at_)pacbell(_dot_)net

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex(_at_)XenCraft(_dot_)com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft                            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------