ietf-822
[Top] [All Lists]

Re: Mail addresses and extended character sets

2001-06-28 06:30:41

2. That there should only be ONE way to encode a given domain name (or
local-part, for that matter).

it's very likely that things will end up this way, for the simple reason
that folks don't want to break existing DNS servers and caches.   so
even though there might be multiple ways to encode a non-ascii domain name
in Unicode (say, using combining and non-combining characters), after
canonicalization and ascii-encoding these should all fold to the same
representation.

Let's please not repeat the whole IDN requirements discussion here, OK?

Yes, but what I had in mind was that, even after you had normalized the
domain name according to the Unicode rules plus the IDN rules, you could
still land up with a domain name like

      #(_at_)$*#$Q*$Q(_dot_)foo(_dot_)ch

(where #(_at_)$*#$Q*$Q is some chinese characters, but the rest in in ASCII,
and my apologies if 'ch' is not china).

China is .cn. and .ch is Switzerland.

Now that has to be encoded to fit
within an RFC 2822 addr-spec. The question is: Do you encode just the
non-ASCII bits of it, or do you encode the whole lot, of can you choose
several ways, for example (using a fairly obvious hex encoding just to
illustrate the point):

      =2340242a2324512a2451=.foo.ch
or    =2340242a2324512a24512e666f6f=.ch
or    =2340242a2324512a24512e666f6f2e6368=

And the answer is that there will be only one way to do this. Again, if you
intend to get into this issues you *really* need to follow and possibly
participate in the IDN discussion.

Those might all pass through mailing systems (possibly morphing along the
way) and arrive at the correct destination, but any digital signature
which included them would break, unless one of them is declared to be the
canonical form.

This won't happen, but not because of any need to preserve digital signatures.

Another long-term solution to all this, of course, is to use UTF-8 in mail
headers, and then the whole problem goes away (so far as mail is concerned
- the DNS might still need to be fixed).

On the contrary, using UTF-8 does NOT address this issue. See Keith's point
above about combining and non-combining characters. Because of this a just send
UTF-8 solution doesn't work. You need another mechanism -- IDN's nameprep --
to deal with this problem.

But that solution is not
immediately available (sendmail would break, for a start). However, Ned
did say some while back that he would work on UTF-8 in mail headers,
presumably as an extension to RFC 2821. Is that still his intention?

I said that before the IDN work started. Now that IDN is well underway and it
has become quite clear that any work before IDN settles on a solution is a
total waste of time, it is my intention to wait for IDN to figure out what they
are doing with domain names. That choice will then drive all sorts of other
choices, including how (and whether or not) we enhance message headers.

Look, this whole discussion is silly. You're simply repeating arguments and
that have already been made and ground that has already been covereed by IDN.
Either trust that the IDN folks know what they are doing and will address your
issues or don't trust them and get involved in the work so as to insure that
your concerns are addressed.

This will be my final mail on this topic on this list.

                                Ned