Re: ISO 2022 (Was: Re: The Swedish Initiative)

Old soldiers never admit their faults. So, let them just disappear.

Well, I think MIME compliant applications are required to support
all the ISO-8859-[1-9] charsets too. How they support it is not
further specified.


I also though they are standardized. But if just decoding Q and B
encoding means support (doesn't it?), what the point of standard
charsets?

And I think IAB has made a statement that they are not going
to standardize on the character set and internationalization issues
for Intenet specs for a while.


I don't think IAB made such statement.

I think that this means that we are just going to experimentalize
for some years, and then see what is the outcome.


Obviously, people outside of US has collected some experience.

On the other hand, if people in US keep using ASCII only, additional
10 years means no experience for them.

(taken in the order of publishing, according to my aging memory)
I see another candidate, namely SGML/HTML, but their proponents have not
argued this on this list.


Considering that SGML/HTML must be able to be semi-plain text,
internationalized plain text is the basic requirement to
SGML/HTML.

A few observations: 

1. What we have in MIME today, the ASCII and ISO-8859-? (not -10)
support is creating enclaves of localized areas where a certain charset
is spoken, and communications between these areas are cumbersome.
For example Western and Eastern Europe cannot communicate efficiently
although many of the characters are the same in iso-8859-1 and
iso-8859-2. The same applies for Turkey using iso-8859-9 and the
rest of Europe. Also this a a major problem in the Nordic Countries
with schools opting for iso-8859-10 and the rest for iso-8859-1.


That is, charset is to distinguish localizations, not for
internationalization.

2. MIME *is* capable of handling universal schemes, viz. mnemonic,
ISO-2022-INT and UTF-7.


The difference between ISO-2022-INT and other two is that ISO-2022-INT
is US-ASCII friendly. So, unlike mnemonic nor UTF-7 which consumes
some UNIMPORTANT characters (such as '=' for quoted printables),
ISO-2022-INT may be used without MIME labelling.

3. Standard MIME creates a hostile environment when downgrading 
from 8-bit charsets to 7-bit ASCII - rfc-822 mail, as both
BASE64 and Quoted-Printable are considered unreadable in many
environments in their raw forms (read as plain ASCII).


So, so is mnemonic and UTF-7.

A recommended goal for the universal characer encoding scheme
for Internet mail would be that a feasible downgrading model
to existing Internet mail practices with as good as possible
usability for users be available and mandatory in the specification.


Why downgrade, if you don't have to?

Going from
the current goal of iso-8859-1 in my environment to a universal
scheme could create further problems, eg. 10646 encodings will
give problems just read as raw iso-8859-1, and that would also
be the case for ISO-2022-INT, while mnemonics and SGML/HTML
would be understandable as raw iso-8859-1.


Not necessarily. ISO-2022-INT is designed to be not only US-ASCII
friendly but also ISO-8859-1 friendly.

So, you can use mixed ISO-2022-INT ISO-8859-1 environment
with charset=ISO-8859-1, in which ISO-8859-1 specific characters
are repsented with 8-th bit set (and will be displayed as is on existing
terminals supporting 8bit 8859/1) and characters alien to ISO-8859-1
may be represented with escape sequences. MIME charset to label
the 8859/1 localization will be useful until the full transition
to ISO-2022-INT.

Our new scheme should be capable of handling intra-MIME charset
compabilities, so that we should not go thru the same painful
transtition as we do between 7- and 8-bit, when we go to
16 bit, and at some time again from 16 to 32 bit.


I don't think we go to 16 or 32 bit. It's too painful to abandon
US-ASCII.

                                                Masataka Ohta