[Top] [All Lists]

Re: [idn] Re: 7 bits forever!

2002-04-04 10:21:10

Patrik Fältström wrote:

Why would you do that?

To get as many characters as possible into the field.

A different solution is path [B] and if that is chosen, UTF8 is
definitly not the most efficient encoding of Unicode in a binary
field which have a length specifier, like a label in the DNS protocol.

It seems that you are mixing two things here. On the one hand we are
talking about a binary representation which needs a tagging. But if you
are removing the ACE prefix then you need some other tag, which could
easily be an EDNS label identifier. At that point, the length tag that
exists in the legacy STD13 label is not particularly relevant. To wit, one
of the changes I have to make to DM-IDNS is to allow UTF-8 equivalents to
have longer labels, so that we can get around ACE length issues that
result in longer UTF-8 names.

Secondarily, there has been much discussion that UTF-8 and IDNA both have
cases where one is shorter than the other. I don't think either of them
have overwhelming wins as far as length is concerned. A binary version of
IDNA would have the one-win scenario of relatively synchronized lengths,
but for some situations it would be longer than UTF-8 which invalidates
the argument, unless you can show that the binary IDNA is consistently and
provably shorter for the overwhelming majority. Even that may not be
enough, however.

As for normalization and comparison problems, all encodings suffer from
that equally (assuming a one-pass conversion, which multiple encodings
explicitly do not allow for, by definition), so I certainly don't see any
wins for using a different encoding there.

For existing ASCII-bound protocols and data-types, certainly the use of
ACE makes sense. But for the applications and protocols that are using
UTF-8 data in conformance with IAB guidelines, letting them extract and
resolve data in that form seems to make the most sense. The only way that
I could see another encoding as even being a valid consideration would be
for the IAB guidelines to change.

With all of that in mind, I can't see how introducing yet another encoding
helps anything, other than something that isn't actually a problem in the
first place.

To bring this back on-topic, I'm really beginning to believe that the
ultimate final solution to the SMTP problem will be a new message format,
where the headers are defined as UTF-8 from the start, in conformance with
the IAB guidelines. Rather than having headers co-exist, downgrade the new
message format whenever legacy systems and mailboxes are encountered.

Eric A. Hall                              
Internet Core Protocols

<Prev in Thread] Current Thread [Next in Thread>