[Top] [All Lists]

Re: [idn] Re: 7 bits forever!

2002-04-04 00:08:02

--On 2002-04-03 13.59 -0500 Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> 

If we are saying that the ACE <=> IDN(UTF8) so why is the two headers
not in sync?

because there are things which will know about the old headers and change
those, without changing the new utf8 headers.

By the way, ACE != IDN(UTF8).

UTF-8 and ACE are two different ways of representing a Unicode String.

The best solution for this whole mess would be to for every protocol, every
protocol parameter find the most efficient encoding of Unicode for that

That imply that for ACII protocol parameters, we use ACE.

For 8bit-text-fields, UTF-8 is what can be used.

For binary fields (ASN.1 and DNS labels) we can use some binary form (like
Punycode without the last "base-32").

That people discuss generic use of UTF-8 all over the place is better than
ACE all over the place, or some new binary form all over the place amazes

The problem is not in the encoding (and with that I mean we can aswell be
conservative and use something (ACE) which works all over the place).
People seem to think that "if we use UTF-8 encoding, things are fine,
people will get nice display, we will not have copy-paste-leakage" etc etc.

That is wrong.

The problem is the use of Unicode (or having multiple charsets). Try
matching two Unicode strings, and see how easy that is. If you manage to
get your brain around that problem, think about sorting and things like
paged-result-sets in LDAP. And if that is not enough, think about what
"network byte order" means in a situation where the local part of an email
address contain mixed reight-to-left and left-to-right text.

If you have a simple solution for that aswell which makes people happy, let
me know.

So, I think this discussion should be about first of all the use of Unicode
in an email address.

Secondly, we can talk about the encoding. But, the encoding is not the


<Prev in Thread] Current Thread [Next in Thread>