Pete Resnick <presnick(_at_)qualcomm(_dot_)com> writes:
And again, I can't get all that excited (without some real examples)
about implementations that can't go through a text string and unquote
quoted pairs.
I know you may be asking for a mail implementation, for which this isn't
so useful.
NNTP absolutely bans all whitespace in message IDs, in a quoted-pair or
not, in the protocol. Using any form of whitespace, however escaped, in a
message ID breaks the NNTP protocol at a fundamental level and breaks
every NNTP implementation I'm aware of. Hence, messages with message IDs
containing whitespace cannot be conveyed over NNTP, and therefore are
essentially impossible in netnews.
Apart from that, all netnews server implementations I'm aware of do a
character-by-character comparison of message IDs, without regard for
escaping, quoting, or other subtleties that may render two differently
encoded message IDs identical.
Appendix A.2 of RFC 3977 is a good summary from a netnews perspective.
Note that this specification allows for converting the message ID in the
message to a canonical form for NNTP, in part because the ongoing
difference between message ID specifications in netnews and e-mail may
make this useful, but I don't know of any existing implementations that
make use of this allowance.
A.2. Message-IDs
Every article handled by an NNTP server MUST have a unique
message-id. For the purposes of this specification, a message-id is
an arbitrary opaque string that merely needs to meet certain
syntactic requirements and is just a way to refer to the article.
Because there is a significant risk that old articles will be
reinjected into the global Usenet system, RFC 1036 [RFC1036] requires
that message-ids are globally unique for all time.
This specification states that message-ids are the same if and only
if they consist of the same sequence of octets. Other specifications
may define two different sequences as being equal because they are
putting an interpretation on particular characters. RFC 2822
[RFC2822] has a concept of "quoted" and "escaped" characters. It
therefore considers the three message-ids:
<ab(_dot_)cd(_at_)example(_dot_)com>
<"ab.cd"@example.com>
<"ab.\cd"@example.com>
as being identical. Therefore, an NNTP implementation handing email
articles must ensure that only one of these three appears in the
protocol and that the other two are converted to it as and when
necessary, such as when a client checks the results of a NEWNEWS
command against an internal database of message-ids. Note that
RFC 1036 [RFC1036] never treats two different strings as being
identical. Its successor (as of the time of writing) restricts the
syntax of message-ids so that, whenever RFC 2822 would treat two
strings as equivalent, only one of them is valid (in the above
example, only the first string is valid).
This specification does not describe how the message-id of an article
is determined; it may be deduced from the contents of the article or
derived from some external source. If the server is also conforming
to another specification that contains a definition of message-id
compatible with this one, the server SHOULD use those message-ids. A
common approach, and one that SHOULD be used for email and Netnews
articles, is to extract the message-id from the contents of a header
with name "Message-ID". This may not be as simple as copying the
entire header contents; it may be necessary to strip off comments and
undo quoting, or to reduce "equivalent" message-ids to a canonical
form.
If an article is obtained through the IHAVE command, there will be a
message-id provided with the command. The server MAY either use it
or determine one from the article contents. However, whichever it
does, it SHOULD ensure that, if the IHAVE command is repeated with
the same argument and article, it will be recognized as a duplicate.
If an article does not contain a message-id that the server can
identify, it MUST synthesize one. This could, for example, be a
simple sequence number or be based on the date and time when the
article arrived. When email or Netnews articles are handled, a
Message-ID header SHOULD be added to ensure global consistency and
uniqueness.
Note that, because the message-id might not have been derived from
the Message-ID header in the article, the following example is
legitimate (though unusual):
[C] HEAD <45223423(_at_)example(_dot_)com>
[S] 221 0 <45223423(_at_)example(_dot_)com>
[S] Path: pathost!demo!whitehouse!not-for-mail
[S] Message-ID: <1234(_at_)example(_dot_)net>
[S] From: "Demo User" <nobody(_at_)example(_dot_)net>
[S] Newsgroups: misc.test
[S] Subject: I am just a test article
[S] Date: 6 Oct 1998 04:38:40 -0500
[S] Organization: An Example Net, Uncertain, Texas
[S] .
--
Russ Allbery (rra(_at_)stanford(_dot_)edu)
<http://www.eyrie.org/~eagle/>