Re: UTF8 vs. Punycode


At 11:30 AM +0200 8/14/07, Simon Josefsson wrote:

One risk is that the specification cannot use Unicode code points from a
newer Unicode version than IDNA ToASCII supports, right now that means
Unicode 3.2.

That is not necessarily true. The current version of IDNA supportsUnicode version 3.2. A future version of IDNA may support laterversions of Unicode.

Since some time, we have Unicode 5.0, which includes many
important code points for a variety of languages.

Not to get into a flame-war here, but I think that "important" is agross overstatement. There are a few minor scripts and personal namecharacters that are not included in Unicode 3.2, but there has beenessentially no public pressure on the IETF to update them.

Newer versions of
Unicode will be released in the future.  Having to update this
specification for every IDNA/ToUnicode release seems sub-optimal to me.

Fully agree. It is the responsibility of the IETF to make sure thatis not needed.

I believe it is better to teach protocols how to deal with non-ASCII
data, rather than relying on IDNA idiosyncrasies in every IETF protocol.


We disagree here, particularly for security protocols.

The choice to remain with ASCII has been made for the DNS protocol,
where it makes some sense due to backwards compatibility reasons, but
that does not mean we have to make the same choice in every IETF
protocol.

This makes no sense here. The protocol in question is representingemail addresses. The right side of an email address is a domain name.

Some IETF protocols can easily negotiate support for UTF-8 on
both sides, and using UTF-8 rather than Punycode seems more robust and
like better engineering to me.

Un-normalized fails miserably when exact matching is needed, such asit is in IBE.


--Paul Hoffman, Director
--VPN Consortium