ietf-822
[Top] [All Lists]

Re: IDN (was Did anyone tell Microsoft yet?)

2002-05-06 19:12:17

In <20020505092959(_dot_)15471(_dot_)qmail(_at_)cr(_dot_)yp(_dot_)to> "D. J. 
Bernstein" <djb(_at_)cr(_dot_)yp(_dot_)to> writes:

The time that programmers waste dealing with today's horrifying mess of
character encodings is time taken away from providing real features for
the users. It's so much time, in fact, that programmers invariably cut
corners, producing occasional failures for the users.

The obvious solution to both of these problems is to settle on a single
character encoding. We've chosen the UTF-8 encoding of Unicode, because
it's compatible with ASCII, it's self-synchronizing, it includes all the
characters we need, etc. The transition plan is straightforward:

  (1) add support for that encoding to all readers; then
  (2) switch all writers to that encoding; then
  (3) convert all stored data to that encoding; then
  (4) stop worrying about other encodings.

Not quite, because it is hard to persuade implementors to add a feature to
*all* readers that will not result in some immediately visible benefit.
But there are other ways to do it.

We start from two basic situations:

1. UTF-8 is simple to generate, but current transports don't transport it,
and readers do not display it. Otherwise it is fine :-) .

2. Codecs such as RFC 2047 are difficult to implement (evidence: so many
implementations do it wrong) but at least they sort-of work at present and
sort-of comply with current standards. Adding IDNA as an additional codec
will surely give us more of the same.


SO you define an SMTP extension (let's call it "I18N") which accepts and
tranports UTF-8, and which does the "right thing" as regards DNS (such as
doing the IDNA conversion if is was not already done as received).

At the same time, you define a UTF-8 extension to the mail format, to be
used as an alternative to the existing RFC 2047 etc. Clearly, UTF-8
extended messages can only be used with MTAs which support I18N.

So what happens? Initially, nothing. But gradually I18N transports begin
to appear, and a few brave people start sending UTF-8 messages. That
encourages the appearance of more I18N transports.

Of course, use of RFC 2047 continues in parallel, but there comes a time
when it becomes apparent that UTF-8 messages are being transported more
reliably than the RFC 2047 ones (simply because they are more easily
implemented and therefore their implementations are more robust). At that
point, there will be a rush to switch over.

And finally, there will be a long and decreasing tail of old-style usage,
which will give worse and worse service as less and less effort is
expended on supporting those old formats. Eventually, after many years,
the old formats will disappear from the standards.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5