There is a lot of talk about using universal character sets. I
think this stems from concern about how to handle the large
number of character sets in use today (from 7-bit Swedish to
Japanesified ISO2022). I wonder whether universal character sets
really solve the problem or just move it to a different level
where it might be harder to control.
the theory is that the sender should be allowed to compose the
message he wants the user to see and let the recipient work out how
to display it
Much as I hate to get into these discussions (mail protocols are a
bit outside my field).
Unicode, one of the two major competing universal codesets has a few
characteristics that make it ideal for text interchange between
systems that aren't guaranteed to speak anything like the same
1. Unicode includes the entire repertoires of virtually every major
standard in use now anywhere in the world, including ASCII, 8859
series, JIS, Chinese & Korean standards, etc., etc.
2. it's a flat 16 bits so it avoids all the weirdo decoding problems
that other schemes might need.
Here's another method of handling a bunch of codesets simultaneously:
1. map your stuff into Unicode by using a mapping table (it's just
about guaranteed that any codeset you ever heard of is included in
2. compress it with some easily available compression protocol
3. uuencode it or whatever you do, with an easily availale protocol
4. ship it however you ship it (7 or 8-bit protocols, whatever)
Then, as Bob says,
5. let the receiver worry about how to map from Unicode to whatever
the local jargon is.
I find it distressing that there's so much intense discussion of how
to deal with zillions of these codesets simultaneously using all
kinds of baroque header information and scheming when a wonderful
answer is lying around waiting to be picked up. Just map your codes
into something, like Unicode, that already includes the repertoire of
any other standard you'd possibly want to use. This is one of the
major wonderful features of Unicode: it includes more existing
standards and than any the competing universal set. All you need to
do is define the wrapper to put around text that's encoded that way.