Re: ISO 2022

I guess I really don't have the same model of how (for example)
10646 is intended to be used.


I don't have the same model for 10646 either. If you take another look
at my message, you will notice that I was talking about 2022, which is
a very different animal.

Also, we should be very careful when we talk about 10646, because it
is currently undergoing a series of changes that could have serious
consequences for our email standards discussions.

As I read the spec, ISO10646 has many duplicate encodings for the
same screen glyph -- that's why I think of it as a character set registry
framework, rather than a character set itself.


Although DIS 10646 has duplicate encodings for the same glyph, it is
definitely not a character set registry framework. ISO 2022 fits that
description better.

How on earth are you planning on subsetting this problem?  It is simply
not acceptible to anyone in country X to say "Sorry, you can't represent
your characters in Mail"


Oh, I agree. But it is probably equally unacceptable to mandate
support for all of the world's characters in the RFC's conformance
clause. Would any American vendors take such an RFC seriously?


I guess I should say something constructive now...


Let me try to explain how I see the pieces of the character set puzzle
falling into place.

We need to think of a multilayer model, because we are constrained by
certain facts of life, and it is these constraints that determine the
characteristics of the low-level layer. On top of this, we can build
another layer with fewer constraints. The OSI 7-layer model may be
good, but to keep this discussion simple, I wish to talk about just
two layers.

Now to elaborate on these so-called facts of life.

It is a fact of life that many of the world's SMTP sites are 7-bit. It
will take a long time for all of these to be upgraded to 8-bit. Some
of them may never upgrade. While one could argue that 8-bit SMTP can
be used quite effectively when restricted to local email, I would
think that it should be our goal to be able to send email anywhere,
even to a site on the other side of a 7-bit MTA.

It is also a fact of life that many of the users' mailers display the
stuff that was sent down the wire directly. No conversion is performed
between the wire protocol and the terminal text, apart from simple
conversions such as ASCII->EBCDIC and CRLF->EOL.

Another fact of life is that there exist converters that juggle the
national variant characters in ASCII and EBCDIC, in such a way that
information is sometimes lost. In much the same way that it will take
some time for SMTP sites to upgrade to 8 bits, it will also take a
long time for these converters to be provided with some intelligence.

These three facts of life effectively dictate the solution. (Or at
least, the first step of the solution.)

The so-called quoted-readable transformation. Instead of sending an
8-bit Latin-1 e-acute down the wire, we send the three ASCII
characters &e'. This satisfies the requirements associated with the
three facts of life above. I.e. it is 7-bit, it is readable, and it
doesn't contain any national variant characters.

People with new mailers, which conform to the new RFCs, will be able
to see the e-acute directly, since these mailers will make good use of
the two layers. They will automatically convert the &e' to e-acute.

Now some of you may comment that this is rather biased towards the
West. I cannot deny this. Non-alphabetic languages such as Japanese
cannot be encoded in 7-bit, nationally-invariant ASCII and still be
"readable" on current "terminals". However, I do not think there is an
immediate need to solve the Japanese problem. The European character
problem needs to be addressed now, if not yesterday.

So what are the next steps of the solution?

After a while, many users will have new mailers, and many gateways
will have new converters. If the new RFC mandates something like
Base64, many conformant systems will be able to send ANYTHING,
ANYWHERE. I.e. even character encodings that use the octet sequence
CRLF can be encoded safely and transparently in Base64. Many people
will have new mailers, so Base64 messages will be automatically
decoded. So even Japanese 2022 can then be sent through any gateway. 
And if the receiver understands Japanese 2022, all is well.

At about the same time, or maybe later or sooner, many users will have
software that understands a multilingual character encoding. (I won't
name it. :-) So that would be the time to draft an RFC that mandates
that encoding as the standard character encoding.


Now I haven't talked about the possibility that many of us may move to
a different transport, such as 8-bit SMTP, or even something
completely different. But I think that most of what I said above will
be true anyway, and it should be possible to migrate to a different
transport in parallel with the above sequence of steps.


Thanks for reading this far. I would be grateful if people could
comment on the rough outline I have given in this message.


Cheers,
Erik