ietf-822
[Top] [All Lists]

IDN (was Did anyone tell Microsoft yet?)

2002-04-25 09:13:58

In <iluy9fc65gb(_dot_)fsf(_at_)extundo(_dot_)com> Simon Josefsson 
<simon+ietf-822(_at_)josefsson(_dot_)org> writes:

Agreed.  OTOH, such considerations should go into the original 
design of protocols.  Complex solutions have complex errors 
leading to interop failures.  If the solution is simple, it is 
more likely that only simple errors occur.  I'm afraid we'll get 
our share of complex errors when IDN start to make its way into 
mail and other infrastructure... The horrors, the horrors.

Maybe, maybe, but it is surely going to happen, so this looks like a good
excuse to ask what the current thinking is.

Currently, the IDNA draft (draft-ietf-idn-idna-07.txt) tells you how to
encode a non-ASCII domain name so that it looks like an ASCII domain name,
which you can then offer to gethostbyname and friends. For sure, people
will want to start using non-ASCII in domain names, and they will want to
use them in local-parts too.

So we can expect to see headers like

To: Jürgen Schmidt <jürgen(_at_)tu-münchen(_dot_)de>

(I have stuck to characters displayable in ISO-8859-1; the real customers,
of course, will be the Chinese and the Koreans).

The IDNA recommendation is that applications should display everything in
character sets understandable by the users, and users should not have to
type anything other than those character sets. So the user will see (or
type) exactly the header I showed above. IDNA also recommends keeping
stuff in local character sets as long as possible, only downgrading to
Punycode when about to call gethostbyname.

From the mail POV, there are two separate questions:

1. What to do about that header, when passing it as a header to SMTP?

2. What to do about the addr-spec part of it when constructing the
envelope (i.e. RCPT)?

The answer to the two questions may or may not be the same.


1. What to do about that header, when passing it as a header to SMTP?

Current wisdom is that 'Jürgen Schmidt' should be encoded using RFC 2047
(but don't try putting it in a quoted-string as '"Jürgen Schmidt"').
'jürgen' and 'tu-münchen' are currently illegal, of course. Possible
future extensions might allow:

A. Extending RFC 2047 for use in those cases.
B. Encoding as per IDNA, giving
        <zz--jrgen-kva(_at_)zz--tu-mnchen-t9a(_dot_)de>
   where 'zz--' is the "ACE prefix" (they have not decided yet exactly
   what it will be).
C. Invent YAEFUTA. I think we want that like we want a hole in the head.
   We already have two such encodings (RFC 2047 and RFC 2231, and some
   people think that is already two too many :-( ).
D. Translate to UTF-8, and allow that in the transport protocol. That is
   the ideal solution, but it is not clear exactly how we get to that
   state from the state we are in. Transition could be painful, but maybe
   the pain is worth it in view of the benefit.

BUT, whatever we do, please let us do the SAME THING for both 'jürgen' and
'tu-münchen'.

2. What to do about the addr-spec part of it when constructing the
envelope (i.e. RCPT)?

Again, the same four solutions are possible, but solution B is making
better sense that in case 1, because the envlope is not (normally) for
human reading, and at least the translation to 'zz--tu-mnchen-t9a' is
going to be needed before the transport can determine the IP address to
send it to; also, it should work immediately with existing transports, so
long as an upgraded gethostbyname is available. (But I would still prefer
a UTF-8 solution).

So, which way do people think we are likely to jump?


Note that my particular concern is with the local-part, because we have a
problem in Usefor when sending mail to moderators. Suppose there were to
be a newsgroup
        dk.test.utf8-æøå.moderated
(the unmoderated group already exists, for experimental purposes). Then
you would submit articles to the moderator at
        dk-test-utf8-æøå-moderated(_at_)moderators(_dot_)isc(_dot_)org

But, of course, you can't do that at the moment. So the present Usefor
draft contains some weasel words to the effect that there is a problem
here, and that its proper solution will have to await the invention of a
suitable extension to the Email protocols. But if somebody can give some
hint as to what that extension might eventually be, I might be able to
write some less-weazelish words.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5