Re: Message-IDs - Another Fine Mess



In <01KF8JCEOCBS0045PS(_at_)mauve(_dot_)mrochek(_dot_)com> 
ned+ietf-822(_at_)mrochek(_dot_)com writes:

I am still sending this to both lists, with Reply-To to both.

3. Now I have just found another feature/bug.

I can't speak to news, but this is an issue that email software has had to
deal with for almost two decades now.

Consider the following three msg-ids, all syntactically correct in RFC 2822:

A.   <Joe_Doe(_at_)[127(_dot_)0(_dot_)0(_dot_)1]>
B.   <"Joe_Doe"@[127.0.0.1]>
C.   <"Joe\_Doe"@[127\.0\.0\.1]>

Question. Are those three semantically the same in RFC 2822?

Yes they are.


So are you saying, for example, that mail readers which do threading based
on the References line are programmed to take this into account? Seems a
lot of unnecessary work to me.

OK, here is an experiment. The Message-ID of your mail was
<01KF8JCEOCBS0045PS(_at_)mauve(_dot_)mrochek(_dot_)com>. I have made the 
References line in
this message to be <"01KF8JCE\OCBS0045PS"@mauve.mrochek.com>.

Hands up anybody with a threading mail reader that threaded my reply as a
followup to yours (and hands down if it didn't).

Yes, so A, B, and C are all semantically equivalent. The clear implication,
then, is that normalization is necessary if you want to perform proper
semantic comparisons.

Now I suspect this is a Bad Thing even in Email (though I am not sure
that any of the Email Standards makes any official use of the msg-id).

I fail to see what's Bad about it. Sure, normalization is a pain, but the clear
trend is to do more and more of it, not less. Normalization forms for Unicode
are such a joy...


Indeed so, but the Unicode people have an excellent principle known as
"Early Uniform Normalization", which I find described in
draft-duerst-i18n-norm-04.txt (though there may be a later draft, or even
an RFC, by now).

Essentially, Early Uniform Normalization means that you normalize it
(whatever "it" is) *before* it goes out on the wire, so that all sites
that receive and process it can assume it is already normalized, and so
can skip any re-normalization-before-comparison process.

And indeed, this is exactly what we have done in Usefor in the case of
newsgroup-names. They had better be in Unicode NFKC normal form when they
leave the posting agent, and if they are not they will not get very far.
Indeed, they had better be in that form at newsgroup-creation time.

So, applying that principle to the present case, one would say that
posting/mailing agents MUST only ever put out that msg-id in the form
        <Joe_Doe(_at_)[127(_dot_)0(_dot_)0(_dot_)1]>

Well, it is not for me to say what is to happen in mailing agents, but in
the case of posting agents that is essentially the second (syntactic)
solution that I proposed. Essentially, the non-normalized versions are
just not allowed by the syntax.

But in Netnews it would lead to GROSS interoperability problems.

The rest of this message is concerned with how it might be fixed in
Usefor (RFC 2822 now being cast in concrete). The ietf-822 people may
stop readin now, but are welcome to continue and comment if they wish
:-) .

I see two solutions. One is Brute Force (and involves sophistry to
boot). The other is syntactic (it just excludes all quoting that is not
strictly essential). I am not particularly impressed by either solution,
so would welcome suggestions.

Assuming you feel its necessary to "solve" this "problem", I think a
syntactic solution is preferable.


In that case, I think we are in agreement.


-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5