Re: ISO 2022

Neil Katin writes:

I guess I really don't have the same model of how (for example)
10646 is intended to be used.  I suspect that everyone will want
their own personal character set to be represented.  Specifically
people in Japan will want the Japanese glyphs, people in China
will want Chinese encodings, Taiwan has a different set of encodings,
Korea yet another set, and I haven't even touched on all the ISO8859
variants.

As I read the spec, ISO10646 has many duplicate encodings for the
same screen glyph -- that's why I think of it as a character set registry
framework, rather than a character set itself.


That is true for 10646, as it deals with characters, not glyphs.
For the distiction between characters and glyphs, see earlier
discussion on this list.

How on earth are you planning on subsetting this problem?  It is simply
not acceptible to anyone in country X to say "Sorry, you can't represent
your characters in Mail" -- they will simple go off and do something
ad hoc -- just look at what Europe is doing with 8 bit characters over
SMTP today for a fine example of this principal in action.


I think we should try to do subsetting. This is
already done in the RFC-XXXX where a limited set of character
sets are chosen.

Subsetting is easily done in the 2022 framework,
and it is done in the OSI world all the time.
You just specify the ECMA registry numbers.

If you really think you can specify subsets/common sets/ etc of these
characters, then looking at 10646 is probably a mistake.  Instead,
we should pick a single unified character set (such as Unicode) and
then work on specifying how to translate all these national characters
set to and from this unified set.


Oh, why is 10646 a mistake? 10646 is the international standard (if
everything works OK it will be approved tomorrow)  with all characters
in the world (some are still missing due to validation of the
right way to do them). 

Unicode on the other hand is a big problem. It is not an international
standard, and not even a de facto standard. It has great problems
with backward compatibility with other character sets and is not
designed for communications. Conversion to and from unicode
is a real pain in the arse, due to non-spacing "characters" and
non-uniqueness of a character. If you convert unicode to another
character set and back again, you cannot be guaranteed that the
result is identical to the source.

Keld