Re: IDN (was Did anyone tell Microsoft yet?)


In <Gv6BME(_dot_)KB2(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk> "Charles 
Lindsey" <chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk> writes:

In <200204251938(_dot_)g3PJc6e10910(_at_)astro(_dot_)cs(_dot_)utk(_dot_)edu> 
Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> writes:

If case-folding is done before encoding as in nameprep, then the comparison
will succeed.  If the case-folding is not done, then the comparison will 
fail unless the two local-parts being compared are spelled with the same
casing - essentially forcing case-insensitivity.

Case-folding is easily done in ASCII, and even in ISO-8859-1, but there is
no way a mailer can do case folding in some strange part of Unicode that
it has never heard of (or which had not even been defined at the time that
mailer was compiled).

We had this problem with newsgroup-names in Usefor, and the best we could
do was to say that they MUST be in lower-case (to be enforced by hierarchy
administrators who issue newgroup messages). Anyone who types a name with
an upper -case letter MIGHT be lucky and have it arrive in the correct
newsgroup, but more likely it would result in posting to s non-existent
group.

BTW, how does NAMEPREP cope with this problem? Looks like another draft I
need to look at :-( .


Right, I have looked at NAMEPREP now, and am somewhat horrified.

Firstly, any implementation of it MUST contain a huge set of tables of
characters to be translated and characters to be disallowed.

Secondly, the tables are based on Unicode 3.1. There is no provision for
the tables to be updated for future versions of Unicode. It will need a
new RFC. This seems contrary to the philosophy of Unicode according to
which they have agreed that future versions of Unicode will always be
upwards compatible. Essentially, this means that all that will ever happen
is that the table of Unassigned Codepoints will be reduced, and some of
the formerly Unassigned Codes may then appear in the other tables. But
NAMEPREP seems not to envisage this possibility, so it is a disater
waiting to happen.

So my view is that it is too much to expect these tables to be correctly
incorporated in the User's mailing agent in every PC worldwide. However,
there are far fewer Mail Transport Agents in the world than there are PCs,
so there may be some hope that their tables will be up to date, and kept
so.

So I think that makes a strong case for doing any IDNA conversions at the
front end of MTAs, rather than in User Agents, which is contrary to what
Keith was suggesting a couple of days ago.


Note that the approach we took in Usefor for newsgroup-names was somewhat
more draconian. New newsgroups MUST NOT be created except in the proper
normalized form (so those creating new groups need to understand their
Unicode). Then, so far as User Agents are concerned, they MUST use the
canonical form (which is lowercase, NFKC in fact). If they get it wrong
their articles may not appear where expected, which is Tough. Posting
agents MAY use something like Nameprep, but there is no obligation to do
so.

But I can quite well see that you cannot be as draconian as that in DNS
applications or in Email.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5