This message is being sent to both the usenet-format and ietf-822 lists,
with Reply-To to both. Please try to keep it on both lists until at
least we understand the problem. Ultimately, the solution is for Usefor,
but I would first of all like to hear confirmation from the ietf-822
people that the problem really exists.
The Story So Far:
1. In the old days, a msg-id was just an angle-addr. Then it was pointed
out that this permitted folding, comments and whitespace within a
msg-id, which was clearly a Bad Thing. After some discussion on DRUMS,
therefore, a special syntax for msg-ids was written, with no CFWS in
sight and special syntax rules for no-fold-quote and no-fold-literal.
2. Then someone in Usefor spotted that the syntax still allowed
whitespace, because you could sneak in a SP or an HTAB by using it in
a quoted-pair, which is itself allowed inside a no-fold-quote or a
no-fold-literal. This is still a Bad Thing (indeed quite intolerable for
Usefor, since most existing software would break on it). However, this
was only spotted in the closing days of DRUMS, so it did not get fixed
for RFC 2822. In Usefor, we cured the problem by Brute Force:
"A msg-id MUST NOT contain any WSP within any strict-quoted-pair."
3. Now I have just found another feature/bug.
Consider the following three msg-ids, all syntactically correct in RFC 2822:
A. <Joe_Doe(_at_)[127(_dot_)0(_dot_)0(_dot_)1]>
B. <"Joe_Doe"@[127.0.0.1]>
C. <"Joe\_Doe"@[127\.0\.0\.1]>
Question. Are those three semantically the same in RFC 2822?
Read 3.2.5:
Semantically, neither the optional CFWS outside of the quote
characters nor the quote characters themselves are part of the
quoted-string; the quoted-string is what is contained between the two
quote characters.
And that clearly makes A and B semantically equivalent (well, you
_might_ just argue that the syntax of msg-id does not actually mention
quoted-string, but that is sophistry).
And now read 3.2.2:
Where any quoted-pair appears, it is to be interpreted as the text
character alone. That is to say, the "\" character that appears as
part of a quoted-pair is semantically "invisible".
And that clearly makes B and C semantically equivalent.
Now I suspect this is a Bad Thing even in Email (though I am not sure
that any of the Email Standards makes any official use of the msg-id).
But in Netnews it would lead to GROSS interoperability problems.
So there is the problem. First of all, could the ietf-822 people please
confirm that the problem is genuine, even in Email (or else explain why
it isn't)?
------------------------------------------------------------------------------
The rest of this message is concerned with how it might be fixed in
Usefor (RFC 2822 now being cast in concrete). The ietf-822 people may
stop readin now, but are welcome to continue and comment if they wish
:-) .
Note first of all that the two bits of semantics quoted above from
RFC 2822 apply also within Usefor. That would have been true in any
case but, for the removal of all doubt, I have now explicitly written
them in, mainly because I need to rely on them for the semantics of
parameters.
I see two solutions. One is Brute Force (and involves sophistry to
boot). The other is syntactic (it just excludes all quoting that is not
strictly essential). I am not particularly impressed by either solution,
so would welcome suggestions.
Here now is the complete section on Message-ID as it now stands in Usefor:
5.3. Message-ID
The Message-ID-header contains the article's message identifier, a
unique identifier distinguishing the article from every other
article. The content syntax makes use of syntax defined in [RFC
2822], subject to the following revised definition of no-fold-quote
and no-fold-literal.
header =/ Message-ID-header
Message-ID-header = "Message-ID" ":" SP Message-ID-content
*( ";" other-parameter )
Message-ID-content = msg-id
id-left = dot-atom-text / no-fold-quote
id-right = dot-atom-text / no-fold-literal
no-fold-quote = DQUOTE *( strict-qtext / strict-quoted-pair )
DQUOTE
no-fold-literal = "[" *( dtext / strict-quoted-pair ) "]"
A msg-id MUST NOT contain any WSP within any strict-quoted-pair. The
msg-id MUST NOT be more than 250 octets in length.
NOTE: The syntax ensures that a msg-id is restricted to pure
US-ASCII, and is thus a strict subset of that defined by [RFC
2822]. Moreover, the syntax does not involve any quoted-string
or quoted pair, and hence the semantic interpretations set out
in 2.4.2 do not apply. Rather, the semantic value of a msg-id is
exactly as it is written, so that two msg-ids are different if
their written forms are different (a property that does not
apply to the msg-ids defined in [RFC 2822]).
The exclusion of WSP is to ensure compatibility with existing
software. The length restriction ensures that systems which
accept message identifiers as a parameter when retrieving an
article (e.g. [NNTP]) can rely on a bounded length. Observe that
msg-id includes the '<' and '>'.
[Alternative text]
no-fold-quote = DQUOTE *( strict-qtext / "\" "\" / "\" DQUOTE )
qspecial
*( strict-qtext / "\" "\" / "\" DQUOTE ) DQUOTE
qspecial = "(" / ")" / ; same as specials except
"<" / ">" / ; "\" and DQUOTE quoted
"[" / "]" /
":" / ";" /
"@" / "\" "\" /
"," / "." /
"\" DQUOTE
no-fold-literal = "[" *( dtext / "\" "[" / "\" "]" / "\" "\" ) "]"
The msg-id MUST NOT be more than 250 octets in length.
NOTE: The syntax ensures that a msg-id is restricted to pure
US-ASCII, that no string of characters is quoted unless strictly
necessary (it must contain at least one qspecial) and no single
character is prefixed by a "\" in the form of a quoted-pair
unless strictly necessary, and moreover there is no possibility
for WSP to occur, whether quoted or not. These exclusions are
to ensure compatibility with existing software. The length
restriction ensures that systems which accept message
identifiers as a parameter when retrieving an article (e.g.
[NNTP]) can rely on a bounded length. Observe that msg-id
includes the '<' and '>'.
Consequently, two msg-ids are different if their written forms
are different (irrespective of whether or not the semantic
interpretations of quoted-string and quoted-pair (2.4.2) are
applied, or not). Msg-ids as defined by [RFC 2822] do not share
this property. Msg-ids as defined here are a strict subset of
those defined by [RFC 2822].
[end of Alternative text]
Following the provisions of [RFC 2822], an agent generating an
article's message identifier MUST ensure that it is unique and that
it is NEVER reused (either in Netnews or Email). Moreover, even
though commonly derived from the domain name of the originating site
(and domain names are case-insensitive), a message identifier MUST
NOT be altered in any way during transport, or when copied (as into a
References-header), and thus a simple (case-sensitive) comparison of
octets will always suffice to recognize that same message identifier
wherever it subsequently reappears.
NOTE: some old software may treat message identifiers that
differ only in case within their id-right part as equivalent,
and implementors of agents that generate message identifiers
should be aware of this.
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk Snail: 5
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5