ietf-822
[Top] [All Lists]

Message-IDs - Another Fine Mess

2002-03-11 13:32:16

This message is being sent to both the usenet-format and ietf-822 lists,
with Reply-To to both. Please try to keep it on both lists until at
least we understand the problem. Ultimately, the solution is for Usefor,
but I would first of all like to hear confirmation from the ietf-822
people that the problem really exists.

The Story So Far:

1. In the old days, a msg-id was just an angle-addr. Then it was pointed
out that this permitted folding, comments and whitespace within a
msg-id, which was clearly a Bad Thing. After some discussion on DRUMS,
therefore, a special syntax for msg-ids was written, with no CFWS in
sight and special syntax rules for no-fold-quote and no-fold-literal.

2. Then someone in Usefor spotted that the syntax still allowed
whitespace, because you could sneak in a SP or an HTAB by using it in
a quoted-pair, which is itself allowed inside a no-fold-quote or a
no-fold-literal. This is still a Bad Thing (indeed quite intolerable for
Usefor, since most existing software would break on it). However, this
was only spotted in the closing days of DRUMS, so it did not get fixed
for RFC 2822. In Usefor, we cured the problem by Brute Force:

   "A msg-id MUST NOT contain any WSP within any strict-quoted-pair."

3. Now I have just found another feature/bug.

Consider the following three msg-ids, all syntactically correct in RFC 2822:

A.   <Joe_Doe(_at_)[127(_dot_)0(_dot_)0(_dot_)1]>
B.   <"Joe_Doe"@[127.0.0.1]>
C.   <"Joe\_Doe"@[127\.0\.0\.1]>

Question. Are those three semantically the same in RFC 2822?

Read 3.2.5:

   Semantically, neither the optional CFWS outside of the quote
   characters nor the quote characters themselves are part of the
   quoted-string; the quoted-string is what is contained between the two
   quote characters.

And that clearly makes A and B semantically equivalent (well, you
_might_ just argue that the syntax of msg-id does not actually mention
quoted-string, but that is sophistry).

And now read 3.2.2:

   Where any quoted-pair appears, it is to be interpreted as the text
   character alone.  That is to say, the "\" character that appears as
   part of a quoted-pair is semantically "invisible".

And that clearly makes B and C semantically equivalent.

Now I suspect this is a Bad Thing even in Email (though I am not sure
that any of the Email Standards makes any official use of the msg-id).

But in Netnews it would lead to GROSS interoperability problems.

So there is the problem. First of all, could the ietf-822 people please
confirm that the problem is genuine, even in Email (or else explain why
it isn't)?

------------------------------------------------------------------------------

The rest of this message is concerned with how it might be fixed in
Usefor (RFC 2822 now being cast in concrete). The ietf-822 people may
stop readin now, but are welcome to continue and comment if they wish
:-) .

Note first of all that the two bits of semantics quoted above from
RFC 2822 apply also within Usefor. That would have been true in any
case but, for the removal of all doubt, I have now explicitly written
them in, mainly because I need to rely on them for the semantics of
parameters.

I see two solutions. One is Brute Force (and involves sophistry to
boot). The other is syntactic (it just excludes all quoting that is not
strictly essential). I am not particularly impressed by either solution,
so would welcome suggestions.

Here now is the complete section on Message-ID as it now stands in Usefor:


5.3.  Message-ID
 
   The Message-ID-header contains the article's message identifier, a
   unique identifier distinguishing the article from every other
   article. The content syntax makes use of syntax defined in [RFC
   2822], subject to the following revised definition of no-fold-quote
   and no-fold-literal.
 
      header             =/ Message-ID-header
      Message-ID-header  = "Message-ID" ":" SP Message-ID-content
                              *( ";" other-parameter )
      Message-ID-content = msg-id
      id-left            = dot-atom-text / no-fold-quote
      id-right           = dot-atom-text / no-fold-literal
      no-fold-quote      = DQUOTE *( strict-qtext / strict-quoted-pair )
                              DQUOTE
      no-fold-literal    = "[" *( dtext / strict-quoted-pair ) "]"

   A msg-id MUST NOT contain any WSP within any strict-quoted-pair.  The
   msg-id MUST NOT be more than 250 octets in length.

        NOTE: The syntax ensures that a msg-id is restricted to pure
        US-ASCII, and is thus a strict subset of that defined by [RFC
        2822]. Moreover, the syntax does not involve any quoted-string
        or quoted pair, and hence the semantic interpretations set out
        in 2.4.2 do not apply. Rather, the semantic value of a msg-id is
        exactly as it is written, so that two msg-ids are different if
        their written forms are different (a property that does not
        apply to the msg-ids defined in [RFC 2822]).

        The exclusion of WSP is to ensure compatibility with existing
        software.  The length restriction ensures that systems which
        accept message identifiers as a parameter when retrieving an
        article (e.g. [NNTP]) can rely on a bounded length. Observe that
        msg-id includes the '<' and '>'.

[Alternative text]

      no-fold-quote      = DQUOTE *( strict-qtext / "\" "\" / "\" DQUOTE )
                              qspecial
                              *( strict-qtext / "\" "\" / "\" DQUOTE ) DQUOTE
      qspecial           = "(" / ")" /        ; same as specials except
                           "<" / ">" /        ; "\" and DQUOTE quoted
                           "[" / "]" /
                           ":" / ";" /
                           "@" / "\" "\" /
                           "," / "." /
                           "\" DQUOTE
      no-fold-literal    = "[" *( dtext / "\" "[" / "\" "]" / "\" "\" ) "]"

   The msg-id MUST NOT be more than 250 octets in length.

        NOTE: The syntax ensures that a msg-id is restricted to pure
        US-ASCII, that no string of characters is quoted unless strictly
        necessary (it must contain at least one qspecial) and no single
        character is prefixed by a "\" in the form of a quoted-pair
        unless strictly necessary, and moreover there is no possibility
        for WSP to occur, whether quoted or not.  These exclusions are
        to ensure compatibility with existing software.  The length
        restriction ensures that systems which accept message
        identifiers as a parameter when retrieving an article (e.g.
        [NNTP]) can rely on a bounded length. Observe that msg-id
        includes the '<' and '>'.

        Consequently, two msg-ids are different if their written forms
        are different (irrespective of whether or not the semantic
        interpretations of quoted-string and quoted-pair (2.4.2) are
        applied, or not). Msg-ids as defined by [RFC 2822] do not share
        this property. Msg-ids as defined here are a strict subset of
        those defined by [RFC 2822].
[end of Alternative text]

   Following the provisions of [RFC 2822], an agent generating an
   article's message identifier MUST ensure that it is unique and that
   it is NEVER reused (either in Netnews or Email). Moreover, even
   though commonly derived from the domain name of the originating site
   (and domain names are case-insensitive), a message identifier MUST
   NOT be altered in any way during transport, or when copied (as into a
   References-header), and thus a simple (case-sensitive) comparison of
   octets will always suffice to recognize that same message identifier
   wherever it subsequently reappears.

        NOTE: some old software may treat message identifiers that
        differ only in case within their id-right part as equivalent,
        and implementors of agents that generate message identifiers
        should be aware of this.

Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5


<Prev in Thread] Current Thread [Next in Thread>