Re: SWEDISH CHARACTERS IN EMAIL: THE SUNET INITIATIVE

Maybe I've been talking to Masataka by e-mail too much and my mind is
becoming corrupt, but I think I can explain what he's saying. I think I can
also explain, clearly, why what he is proposing is impractical and not
terribly useful:

On 11/16/94 at 12:38 PM, Masataka Ohta wrote:

Supporting a single localization is easy.

[...]

Certainly. MIME charset mechanism is good to identify multiple
localizations. But, if one decides to use 8bit Latin-1 only, a
single localization, he does not need charset specification.

[...]

The best way of flagging is to use ISO 2022 escape sequences.

By assuming the initial designation of ASCII only, there is no chaos.


So the points are:

1. If you (as a user) are only interested in one character set, it doesn't
matter to you what other character sets exist.
2. If you (as a user) are only going to receive mail which contains
characters from that character set, then you (again, as the user) don't
need any label in the message to tell what it is.
3. ISO 2022 contains all of the different kinds of characters that you (as
a user) want to use, whatever character set you are interested in, so ISO
2022 is all you really need.
4. If you (as a MUA) are only expecting ISO 2022, then you don't need any
labeling to tell you what character sets are in the message.

Therefore according to Ohta if all you are using is ISO 2022, you (as
either a user or a MUA) don't need MIME to label character sets.

This is trivially true, but awfully boring and pragmatically useless. The
fact is that point 4 is unreasonable: The MIME standard already exists and
is written to accomodate lots of character sets. Though it is true that we
could write a version of Eudora that only supported ISO 2022 and didn't
have any MIME headers, we would still have to write another version that
*did* support MIME headers because there are lots of MIME mailers out there
and there are people who *do* want to use more than one character set in
their messages. It doesn't help the position to claim that they don't
*have* to use more than one character set. The fact is that they already
do. So Ohta can try to put the toothpaste back in the tube, but we're not
going to get anywhere down that line.

But more importantly, there are good solid reasons for wanting to use
different character sets labeled in MIME instead of ISO 2022 (and Ohta
knows at least the first reason very well; I saw it in a draft of a paper
he wrote):

(a) ISO 2022 is computationally a pain compared to having explicit charset
labels. You need context to figure out what the escape sequences mean. Ohta
agrees that 2022 is not the "one-true-way" (though he is loath to say that
here, apparently). It's not clear to me why he doesn't admit that here.

(b) The resource load on an MUA to display different glyphs is a great deal
lighter if you have a label which says "everything to follow is in ISO
8859-1". Then you only have to load up fonts for 8859-1. If you are told
"everything that follows is ISO 2022", lord knows what resources you need.
It makes the coding job that much harder.

Given the state of affairs now, it is impractical and silly to recommend to
SUNET that they should not go to MIME. Some day, when we have a single
character set that is the "one true way" (which might be something like
UCS-4, according to Ohta) and mail transports that can handle such things,
perhaps we can define "Mime-Version: 2.0" in the headers and say that all
text parts are in that one true character set. Until then, MIME charset
labels are really the only way to go and SUNET should not be at all
disuaded from doing so. I see no pragmatic reason to tell them this,
Masataka.

pr

--
Pete Resnick - presnick(_at_)qualcomm(_dot_)com
QUALCOMM Incorporated
Home:(217)337-1905 / Fax: (217)337-1980