SWEDISH CHARACTERS IN EMAIL: THE SUNET INITIATIVE

Summary
        - I like ISO 2022,
        - I like MIME charset,
        - I don't see where the conflict is,
        - ISO 2022 and MIME charset support each other.

From: presnick(_at_)qualcomm(_dot_)com (Pete Resnick)
Yes. We currently live in a world where different countries use all sorts
of different schemes for encoding characters...


ISO 2022 itself is not a character set, it can be seen as a "character
set labeling" (designation/invocation) standard. ISO 2022 only defines
how you inform the application which codes are to be taken from which
character set.

A text format that contains mixed languages/character sets has to
provide a syntax for tagging segments anyway. Some MIME proposals use
multipart/mixed structure. Which is *really* simpler and faster to
parse and process: a multipart/mixed MIME message containing a part
for *each* segment from different characterset or ISO2022 conformant
octet stream separating the parts with short ESC sequences?

To me both carry roughly the same information content, and can be
transformed between without loss of information. Which is better,
depends really on content.

        For example, I think multipart/mixed for English/Hebrew
        dictionary that includes phonetic guides, would be rather
        horribly inefficient. Or, one could try to do ISO-2022-JP
        stream as multipart/mixed (charset=iso-88591-1 and
        charset=jisx0208). It would work, but would not be pretty...

In parts of Europe, the ISO 8859 ... etc ...

But if people *do* use it for localizations, they will end up using
it to send messages outside of their locale, and then being
2022-centric will be useless.


But, we *are* currently ISO 2022-centric. People just don't seem to
realize it :-).  All ISO 8859-n, ISO 646 variants, EUC etc *are* ISO
2022. Not all ISO 2022 streams need to contain ESC sequences to switch
state.

I my view, the MIME charset attribute just specifies the *initial*
ISO-2022 state for the text content. I believe the charset labeling in
MIME is good. Simple MUA's can just take the charset as font, ISO-2022
aware clients will set the initial state from it, and can deal with
the more complex mixed streams like ISO-2022-JP and ISO-2022-INT.

For the Macintosh, I may need all sorts of script resources to be
able to handle Japanese or Arabic, even if I have the fonts. I would
like to know this before I enter my display routines.


Assuming that a particular mail must contain Japanese and/or Arabic,
you have to solve the problem anyway, whether you use ISO 2022,
multipart/mixed or UNICODE. (Of course, if your OS supports ISO2022 or
UNICODE directly, your task is easier, and will effect which version
you consider easy.)

ps.     And for those who don't have ISO 2022 parser, they might take
        a look at ccfilter.c module in my Xew widget set:

                ftp://ftp.x.org/contrib/widgets/Xew-2.2.tar.gz

        ccfilter is not really ISO 2022, it is just a skeleton ANSI
        X3.64-1979 ESC sequence muncher, that includes some ISO 2022
        awareness. Application using the filter provides the real
        semantics.
--
Markku Savela (msa(_at_)hemuli(_dot_)tte(_dot_)vtt(_dot_)fi), Technical 
Research Centre of Finland
Multimedia Systems, P.O.Box 1203, FIN-02044 VTT, http://www.funet.fi/~savela