ietf-822
[Top] [All Lists]

Re: Message-IDs (Was Re: Kohn draft (was RE: USEFOR: Current situation and next steps))

2003-02-24 20:12:53

In <3E586B3F(_dot_)6010103(_at_)Sonietta(_dot_)blilly(_dot_)com> Bruce Lilly 
<blilly(_at_)erols(_dot_)com> writes:


2822 did specifically make the message-id syntax more restrictive than
822 (822 allowed whitespace in quoted strings, 2822 only allows it in
quoted-pair (i.e. preceeded by \)

If you look carefully, you'll see that not only must the whitespace
be backslash-escaped, that in turn must be within a quoted-string.
I.e.
  <foo bar(_at_)baz(_dot_)com>    illegal
  <foo\ bar(_at_)baz(_dot_)com>   illegal
  <"foo bar"@baz.com>  illegal
  <"foo\ bar"@baz.com> legal

Sure, but that is guaranteed to be misrecognized by any conforming NNTP
implementation.

There is a long history to this matter. In the DRUMS days, we spotted a
number of problems, which we persuaded them to fix. That is how the
special id-left and id-right syntax got into RFC 2822. But as time went
on, we kept finding more and more quirky cases. The "\ " problem was
spotted just about as DRUMS was going to press, but it was too late to
stop it.

But the real nasty only came to light well after RFC 2822 was published.
Consider the following two cases:

    <"foo\bar"@baz.com>
    <"foobar"@baz.com>
    <foorbar(_at_)baz(_dot_)com>

Now read RFC 2822 VERY carefully. All those three are semantically the
same. If ever you have to ask the question "is this msg-id the same one as
that masg-id" you MUST answer Yes in the case of any pair out of those
three. Of course, that is exactly the question that is asked on Usenet
millions of times every second and, as I said to Dan once, if that
equivalence were forced to be implemented on the existing network, it
would bring it to its knees within minutes.

However, I was assured that there did exist software in the mail world (in
threading readers, presumably, that did indeed follow that semantics to
the letter.

And of course, that situation is never going to arise in the Real World
(TM), so there is absolutely nothing to be gained by allowing it.


I have a small collection (22937 messages). I checked the msg-ids in
top-level header fields Message-ID, Received, In-Reply-To, References,
Resent-Message-ID, Content-ID, and Supersedes. There were none that
were simultaneously
1. legal under RFC 2822 generate rules
and
2a. longer than 250 octets including angle brackets
 or
2b. had backslash-escaped space or control characters in a quoted string.

And how many backslashed characters did you find at all (redundant or
not)? And how many quoted strings for that matter.

Anyway, if you look in section 5.3 of the Usefor draft, you will find a
syntax that fixes it. Yes, it is ugly, but it is the minimal one that
solves the problem. I could live with a simpler one, even if it ruled out
a few more obscure, but otherwise harmless, cases.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5

<Prev in Thread] Current Thread [Next in Thread>