Re: IDN (was Did anyone tell Microsoft yet?)

this problem isn't specific to IDNA. it exists with any use of Unicode
to represent strings that need to be compared for equivalence.


Not necessarily. We are dealing with twp forms of normalization in
Nameprep. One to Normalization Form C or KC, and one from upper case to
lower case. Generally speaking, there are two schools of thought as to
when Normalization should be done: Early and Late.

The official advice of the Unicode people is to do it Early (see
draft-duerst-i18n-norm-04.txt). My view is that that should be as early as
the keyboard driver in the operating system, so that applications can
assume, without further testing, that textual data is already normalized.
However, that is not appropriate for upper to lower case normalization.

The view of the IDNA people seems to be to do it Late, i.e. just before
the call of gethostbyname, or even inside gethostbyname eventually.


I don't think that's a fair representation.  You certainly have to do 
the normalization before encoding in ASCII, so if the normalization wasn't 
done before calling gethostbyname, it had better be done there.  It
won't hurt for gethostbyname to normalize a string that's already
normalized.  This doesn't preclude the app normalizing the string earlier
than that, and in many cases there are good reasons for doing so.

The point is that "ordinary applications" are not supposed to care.


another point is that apps that aren't IDN-aware are not supposed to care.

The view I have been trying to put across is that the point where 
normalization occurs is the weakest link in the system, and that 
therefore such points should be concentrated so that they are few 
in number and easily fixed. From that POV, User Agents fail on both counts.


The view is over-simplistic for at least two reasons.  First is that
you're comparing systems of very different total complexity - 
you will not rid UAs of the burden of supporting unicode (including 
normalization) but you are adding complexity to parts of the system 
that don't need to support unicode, thus adding more opportunities 
for failure.  In other words, your proposal is adding more weak links 
without strengthening the weak links that you're worried about.  

Second, you're failing to consider that there's essentially no incentive 
for a significant percentage of the world's mail users or MTA operators
to upgrade.  An approach that relies on MUA upgrades to provide the new 
functionality lets those who benefit from the new functionality - those 
who have an incentive - do the upgrade and get immediate benefit. This 
gets it deployed more quickly for those users who need it.

Keith