ietf-822
[Top] [All Lists]

Re: draft-resnick-2822upd-02 and Netnews

2007-07-30 20:45:22

Pete Resnick <presnick(_at_)qualcomm(_dot_)com> writes:

And again, I can't get all that excited (without some real examples)
about implementations that can't go through a text string and unquote
quoted pairs.

I know you may be asking for a mail implementation, for which this isn't
so useful.

NNTP absolutely bans all whitespace in message IDs, in a quoted-pair or
not, in the protocol.  Using any form of whitespace, however escaped, in a
message ID breaks the NNTP protocol at a fundamental level and breaks
every NNTP implementation I'm aware of.  Hence, messages with message IDs
containing whitespace cannot be conveyed over NNTP, and therefore are
essentially impossible in netnews.

Apart from that, all netnews server implementations I'm aware of do a
character-by-character comparison of message IDs, without regard for
escaping, quoting, or other subtleties that may render two differently
encoded message IDs identical.

Appendix A.2 of RFC 3977 is a good summary from a netnews perspective.
Note that this specification allows for converting the message ID in the
message to a canonical form for NNTP, in part because the ongoing
difference between message ID specifications in netnews and e-mail may
make this useful, but I don't know of any existing implementations that
make use of this allowance.

A.2.  Message-IDs

   Every article handled by an NNTP server MUST have a unique
   message-id.  For the purposes of this specification, a message-id is
   an arbitrary opaque string that merely needs to meet certain
   syntactic requirements and is just a way to refer to the article.

   Because there is a significant risk that old articles will be
   reinjected into the global Usenet system, RFC 1036 [RFC1036] requires
   that message-ids are globally unique for all time.

   This specification states that message-ids are the same if and only
   if they consist of the same sequence of octets.  Other specifications
   may define two different sequences as being equal because they are
   putting an interpretation on particular characters.  RFC 2822
   [RFC2822] has a concept of "quoted" and "escaped" characters.  It
   therefore considers the three message-ids:

      <ab(_dot_)cd(_at_)example(_dot_)com>
      <"ab.cd"@example.com>
      <"ab.\cd"@example.com>

   as being identical.  Therefore, an NNTP implementation handing email
   articles must ensure that only one of these three appears in the
   protocol and that the other two are converted to it as and when
   necessary, such as when a client checks the results of a NEWNEWS
   command against an internal database of message-ids.  Note that
   RFC 1036 [RFC1036] never treats two different strings as being
   identical.  Its successor (as of the time of writing) restricts the
   syntax of message-ids so that, whenever RFC 2822 would treat two
   strings as equivalent, only one of them is valid (in the above
   example, only the first string is valid).

   This specification does not describe how the message-id of an article
   is determined; it may be deduced from the contents of the article or
   derived from some external source.  If the server is also conforming
   to another specification that contains a definition of message-id
   compatible with this one, the server SHOULD use those message-ids.  A
   common approach, and one that SHOULD be used for email and Netnews
   articles, is to extract the message-id from the contents of a header
   with name "Message-ID".  This may not be as simple as copying the
   entire header contents; it may be necessary to strip off comments and
   undo quoting, or to reduce "equivalent" message-ids to a canonical
   form.

   If an article is obtained through the IHAVE command, there will be a
   message-id provided with the command.  The server MAY either use it
   or determine one from the article contents.  However, whichever it
   does, it SHOULD ensure that, if the IHAVE command is repeated with
   the same argument and article, it will be recognized as a duplicate.

   If an article does not contain a message-id that the server can
   identify, it MUST synthesize one.  This could, for example, be a
   simple sequence number or be based on the date and time when the
   article arrived.  When email or Netnews articles are handled, a
   Message-ID header SHOULD be added to ensure global consistency and
   uniqueness.

   Note that, because the message-id might not have been derived from
   the Message-ID header in the article, the following example is
   legitimate (though unusual):

      [C] HEAD <45223423(_at_)example(_dot_)com>
      [S] 221 0 <45223423(_at_)example(_dot_)com>
      [S] Path: pathost!demo!whitehouse!not-for-mail
      [S] Message-ID: <1234(_at_)example(_dot_)net>
      [S] From: "Demo User" <nobody(_at_)example(_dot_)net>
      [S] Newsgroups: misc.test
      [S] Subject: I am just a test article
      [S] Date: 6 Oct 1998 04:38:40 -0500
      [S] Organization: An Example Net, Uncertain, Texas
      [S] .

-- 
Russ Allbery (rra(_at_)stanford(_dot_)edu)             
<http://www.eyrie.org/~eagle/>