ietf-822
[Top] [All Lists]

Re: Interpretation of RFC 2047

2002-12-18 11:29:43

Alan Barrett wrote:
On Wed, 18 Dec 2002, Keith Moore wrote:

comments are only valid in structured fields.  so in order to
recognize a comment you have to know the set of structured fields.

To be more precise, comments are only valid in certain parts of
structured fields. RFC 2822 is quite explicit about where comments
are and are not permitted for the header fields which it defines.
Several other recent RFCs do likewise, using 2822's CFWS notation.

[...]

OK, all structured header fields.  The entire lexical analyser described
in RFC 822 section 3.1.4, and RFC 2822 section 3.2 (read in conjunction
with section 2.2.2), seems to be intended to apply to all structured
header fields.  I have always assumed that this included any structured
fields that might be defined in the future.

I think the problem is the assumption. 2822 section 3.6.8 requires only
that extension header fields' content conform to the 2822 definition of
"unstructured" syntax, and specifically states that such fields are
uninterpreted as far as 2822 formal syntax is concerned.  The specified
interpretation of parenthesized comments in 2822 refers only to those
header fields specifically defined in 2822 (and of course by other RFCs
which explicitly reference CFWS and RFC 2822).

Apart from RFC 2912 Content-Encoding, what else violates this assumption?

As previously mentioned, because URIs (RFC 2396) can include parentheses
(not necessarily balanced), any header field which includes a URI may
also have parentheses which are not part of a comment. That includes
Content-Location (RFC 2557), Content-Type (RFC 2017 provides for a URL
parameter with the message/external-body media type), and the obsolete
Content-Base (RFC 2110). RFC 2156 headers Discarded-X400-IPMS-Extensions,
Discarded-X400-MTS-Extensions, and X400-Received also have parenthesized
productions with specified structure within the parentheses (i.e. not a
comment), as does the Received header as defined in RFC 2821 (blame RFC
1123).  Various header fields are defined as structured fields but with
unstructured portions; those portions may of course contain parentheses
(not necessarily balanced) which should not be interpreted as comments
(or, in fact, at all). That includes RFC 1894 Original-Envelope-ID,
Reporting-MTA, DSN-Gateway, Received-From-MTA, Remote-MTA,
Original-Recipient, Final-Recipient; RFC 2298 Diagnostic-Code,
MDN-Gateway, and probably Reporting-UA (there is an ambiguity in the
Reporting-UA ABNF, but that does not affect this issue).

Of these, 2557 will have to be revised to exclude comments, as the
current ABNF leads to ambiguities, and a case could be made for a
revision to 2821, since as it stands, it's impossible to distinguish
a parenthesized "TCP-Info" production from a comment, both of which
may appear after a domain or address literal in the from or by clauses
of a Received header [and unfortunately, the most reliable trace
information gets relegated to the inside of parentheses while the
easily-forged content is in well-structured parts of the header]. 2298
needs revision due to an unrelated ambiguity.

The revisions to 2557 and 2298's Reporting-UA should not cause problems;
indeed they are necessary to resolve problems in the existing ABNF syntax
specifications. Revising 2821 would probably cause a certain amount of
wailing and gnashing of teeth, but would at least (if done properly)
provide more reliable information suitable for automated tracing of
mail paths (e.g. for tracking the sources of spam) than is currently
possible.  It would be difficult to present a compelling case for
revision to the other headers, let alone actually coming up with
replacement syntax that doesn't break something -- and there does
not appear to be any potential benefit to doing so.