Re: IDN (was Did anyone tell Microsoft yet?)


In <200205021328(_dot_)g42DSxe18830(_at_)astro(_dot_)cs(_dot_)utk(_dot_)edu> 
Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> writes:

Such a change inhernetly causes widespread disruption.


Again, you are assuming what you are trying to prove, which is no way to
argue.

This is a decision that has to be based on engineering judgement.  
There's no way to "prove" whether any particular choice will be 
successful or not because you can't adequately model the real-world 
conditions.

Ned said, right at the start of this thread, that introducing UTF-8 was
not inherently hard or difficult, but that there were a certain number of
problems that would need to be tackled, amongst which would likely be a
suitable means of falling back to a 7-bit encoding.

It's not difficult to define how it would need to work, and it's not
particularly difficult to write the code.  Introducing the code without
significant disruption is extremely difficult, and the benefit doesn't
justify the cost.


On the contrary, defining how it would work is the difficult part, because
the definition has to include provision for a smooth transition from the
present system. That will likely include fallbacks to the present
notation, and the hard part is deciding just where those fallbacks should
take place.

So yes, it would have to be introduced with care, and tradeoffs between
various approaches would have to be looked into.

But the way forward would be to examine the problems to find ways to work
around them, rather than to declare in advance, without proof, that is is
impossible.

No, that's not a way forward, it's a way backward.  Because by trying to 
force a transition to pure utf-8 you end up not only increasing the 
complexity of MUAs but also of the MTAs that have to negotiate and translate.


But for that reason the utf-8 may turn out to be not so "pure" as you seem
to suppose. Clearly, existing MUAs are going to continue in use for some
considerable time amongst the large number of people who have no
particular need to correspond regularly in chinese. These people may
occasionally see unpronounceable addresses (i.e. post-IDNA ASCII) and will
be well advised to put them in their address books, because typing them by
hand will surely be error-prone :-( .

But people with a regular need to correspond with people in China will
likely be willing to spend money to buy an "internationalized" MUA. Such
agents do not exist yet, so the opportunity now exists to define how they
are to work. That includes deciding whether it is safe to let them do the
IDNA encoding themselves (clearly doing so will be one of the options),
but it also allows the possibility of bringing in utf-8 at the same time.

The interesting question then is whether it will be necessary, useful or
desirable for the MTA to be aware that it is speaking to an
internationalized MUA.

And by putting this translation at additional places in the signal path
you decrease the liklihood that the message will arrive intact.  
Many implementations today can't even encode RFC 2047 correctly.  So how 
can we expect tomorrow's implementations to translate correctly between 
UTF-8 and IDNA+2047?


The agents that do RFC 2047 incorrectly are invariably (almost) MUAs,
since MTAs have almost no reason to look inside that encoding. But if MUAs
cannot even get RFC 2047 right, why do you insist on recommending that
they be given the more demanding (and critical) job of doing IDNA right?
It just ain't going to happen.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5