ietf-822
[Top] [All Lists]

Re: mail vs. news ???

2003-02-21 14:57:49

Mark Crispin <mrc(_at_)CAC(_dot_)Washington(_dot_)EDU> writes:
On Fri, 21 Feb 2003, Russ Allbery wrote:

Usenet's restrictions on the syntax of message ID headers are very
specific and very precise, and much stronger than those of RFC 2822, in
part because message IDs are used as part of the NNTP protocol.

What are those restrictions?

The primary ones are:

 * Absolutely no occurences of either whitespace or the ">" character,
   escaped or not, are permitted inside the message ID.  Either is known
   to break existing software in various ways.

 * Nothing is permitted in the Message-ID header other than the message ID
   itself.  Comments either preceding or following the message ID will
   cause the message to be rejected by many news servers.

 * The message ID must not be longer than about 500 characters.  The
   failure mode for violating this rule tends to be rather nasty for some
   existing NNTP software, including things like desynchronization of the
   protocol between the client and server.  NNTP (unfortunately) has a
   maximum command length defined as part of the protocol.  In practice,
   many news servers enforce a 250 octet limit (including the surrounding
   angle brackets).

Please note that I'm not arguing that these restrictions are desirable,
simply that violating them *will* break existing news software.  I also
don't think that fixing one and possibly two is really worth the effort,
since there isn't much in the way of useful purpose served by not
following those rules anyway.

Comments in various places that mail supports them are not
well-supported by currently deployed Usenet software (although it
certainly hurts nothing to support them when writing new code, other
than adding complexity).  The space after the colon in headers is not
optional on Usenet.  The syntax of the Date header is restricted in
ways somewhat similar to that of the Message-ID header.

Golly gee, where's the chorus of "these are bugs that should be fixed"
now?

Are you expecting me to serve as the chorus?  I certainly hope that you're
not expecting me to try to be consistent with statements made by other
people that I don't necessarily agree with.  I tend to hold my own
opinions and not necessarily agree with other people.  :)

First we hear the claim that 7-bit messaging restrictions in mail are a
"bug that should be fixed" even though 7-bit was specifically in the
standard.

Now we hear the claim that completely unnecessary restictions in headers
are necessary because of news software.

These restrictions are published in RFC 1036, so I would not expect them
to be a surprise to news implementors.  Usenet has, since B news, used a
subset of the mail messaging format.

RFC 1036 is unfortunately imprecise about precisely what additional
restrictions it put on the message format, but at the least the space
after the colon in headers is quite explicit.  The message ID restrictions
are also fairly clear apart from the length limitation (which falls out of
the NNTP protocol instead).  (The bit in RFC 1036 about slashes being
strongly discouraged in message IDs is now completely obsolete.)

The Date specification in RFC 1036 is obnoxious, referring to a particular
software implementation that isn't documented as part of the standard.  In
practice, an RFC 2822 date that doesn't use any of the obsolete syntax is
fine provided that the header is not folded.

Issues surrounding comments are more complex.  Apart from Date and
Message-ID, which are the most sensitive headers that are also shared with
mail, comments in References headers are unlikely to cause catastrophic
problems but may show up as oddities in the thread tree in a news reader
and news software can be fairly picky about the From header (although one
is likely fine as long as one avoids the obsolete syntax rules).

And the IETF/IESG is supposed to respect this?

My message was solely addressing the differences *in practice* that exist
right now on the wire.  I was not attempting to make any sort of statement
about what the future should like.

I personally am very strongly in favor of the unification of messaging
formats, and think that this is one of the most important things that
could come out of USEFOR.  I think that it's reasonable to simply require
that Usenet software going forward cope with comments in the References
header and with the full From syntax in RFC 2822 (possibly omitting the
obsolete rules, since they have never been supported on Usenet).

I'm ambivalent about folded dates.  The date parsing software that I've
written personally and that is used in the software I maintain supports
them.  I don't understand why anyone would generate a folded date, though,
so I can understand why people don't see what purpose is served in
supporting it.

I think that not requiring a space after the colon in headers (except for
compatibility with older messages) is silly, but I don't have a strong
opinion on it.  Changing news software to support this can be a rather
significant undertaking, however, given that this rule is clearly
specified in RFC 1036 and the assumption tends to be very widespread in
any code that parses headers in news messages.

The message ID restrictions hit the single hottest code path in every
Usenet transit server, and I really don't see any purpose served by
complicating the parsing algorithm for message IDs solely to support
rather questionable constructions that can be easily avoided.  Apart from
that personal opinion, I'll also note that removing those restrictions
would require extremely significant changes to the Usenet infrastructure
and would not be in any sense backwards-compatible; lots of software was
written on the basis of the guarantees provided by RFC 1036.

My primary consideration in the standards work that I do on Usenet article
formats is to support backward compatibility with existing software to the
degree that is feasible.  My secondary consideration is to support
unification of the messaging format in order to get rid of the various
places where gatewaying is difficult for silly and unnecessary reasons.  I
consider tighter integration of Usenet and e-mail to be obviously good, a
growing trend, and one of the more interesting applications for Usenet
technology going forward.  NNTP is an interesting alternative access
protocol for large public archives of mail messages because of its extreme
simplicity and very lightweight nature, although anonymous IMAP is
certainly a strong competitor with its much more advanced searching
support.  (Either is obviously utterly superior to converting all of the
messages to HTML and putting them behind a clumsy web page interface.)
NNTP also has some advantages when it comes to mass distribution of
messages.

This is because portions of the news community listened to the siren
song of "just send 8-bits" offered by those individuals who song was
rejected in mail.  Now the news community has a non-interoperable
disaster.

It actually works pretty well on Usenet in those hierarchies that have
standardized on a character set.  It breaks down very badly whenever those
messages move outside of a pure NNTP system, but I'm afraid that I can't
agree with your characterization given the number of people who are very
happily using untagged character sets in their own hierarchies.

However, I *do* agree with you that using random untagged 8-bit character
sets is obviously not a solution to the problem.  It is a bad hack that
works in certain limited and specific situations and actively interferes
with movement to a unified messaging format, and Usenet is already running
hard against its limitations.

But rather than fix the disaster, they seem to want to inflict a new
disaster upon the email community.

I would greatly appreciate it if you would be somewhat careful about who
you choose to include in sweeping pronouns like "they."

The solution to interoperability is to stop claiming that news is
special, and start playing ball with the rest of the messaging world.

And many news implementors have been doing this for years, so please don't
make blanket statements about what everyone on the Usenet side is doing.
The active members of the USEFOR working group are not a representative
sample of Usenet implementors or users, and many of us who strongly
believe in a unified messaging format gave up on USEFOR in disgust years
ago.

-- 
Russ Allbery (rra(_at_)stanford(_dot_)edu)             
<http://www.eyrie.org/~eagle/>

<Prev in Thread] Current Thread [Next in Thread>