[Top] [All Lists]

Re: Revisiting RFC 2822 grammar

2004-01-15 22:07:30

On 1/15/04 at 9:59 PM +0000, Charles Lindsey wrote:

In <3FF7A5FC(_dot_)9080804(_at_)verizon(_dot_)net> 
blilly(_at_)verizon(_dot_)net writes:

quoted-pair     =       ("\" text)
[N.B. had redundant obs-qp alternative]

I think not. The obs version allows \NUL, \CR and \LF, which the regular version does not.


[N.B. RFC 822 ASCII NUL not permitted, even with obs- rules]

Bruce gives many examples of differences from RFC 822. I will leave it to others to comment on the rights and wrongs, but some of them certainly look like bugs in RFC 2822 to me.

I think Bruce is wrong on this one. I see ASCII NUL in 822's CHAR, and that appears in ctext (minus a few characters).

zone            =       ( "+" / "-" ) 4DIGIT
[N.B. no CFWS between +- and 4DIGIT]

Indeed. Are you saying that such was allowed in RFC 822?

It's not entirely clear, but it probably is.

message         =       (fields / obs-fields) [CRLF body]

But this rule leads to horrendous ambiguities, with no prospect of avoiding them in less than 50 pages of syntax :-( . But I shall defer discussion of that till later, because there are other issues with it.

This is identical to 2822.

subject = "Subject:" [FWS] [("cmsg" / "Re: ") [FWS]] unstructured CRLF
[ RFC 1036 sect. 2.2.6 "cmsg" Subject hack, sect. 2.1.4 "Re: " ]

Please, no "Re: " or "cmsg" in the syntax.


path            =       ("<" [CFWS] [addr-spec] ">" [CFWS]) / obs-path

I think it would be better to say

path            = angle-addr / "<" [CFWS] ">" [CFWS]

That way you avoid the need for obs-path (obs-angle-address takes care of it)

Seems OK to me.

Now we come to the obs- syntax, where there are still many ambiguities. As things stand, sometimes the obs- syntax allows something that is already in the regular syntax (that is ambiguous). OTOH, sometimes it does not, and sometimes it allows only a part of what is in the regular syntax, all of which can be very confusing to the reader who tries to work out exactly how the obs- syntax differs from the regular.

I disagree completely. I think it's much easier for the reader to have some complete pieces in the obs- syntax even if there is redundancy. For example, I think the horrible hoop you have to jump through below for obs-local-part:

obs-local-part  =       *(word ".") word "." CFWS word *("." [CFWS] word) /
                        *(atom ".") atom "." quoted-string *("." word) /
*(quoted-string ".") quoted-string "." atom *("." word) /
                        1*(quoted-string ".") quoted-string

is just nuts. I don't think this is the correct approach, I think it makes the syntax completely unusable to a reader, and I would strongly object to anything like it.

I'm going to skip all of your examples of this.

obs-utext       =       *LF *CR *(obs-char *LF *CR)
[N.B. was obs-text]

No, that does not work because it allows CRLF not followed by WSP in the middle of an 'unstructured'.

Yup, because you can have "obs-utext obs-utext" which could be (abc CR)(LF def).

I think the only way out of that is to rewrite the rule for 'unstructured':

unstructured    =       *(utext [FWS]) obs-ltext
obs-utext       =       (1*LF *CR / 1*CR) obs-char / NUL
obs-ltext       =       *LF *CR

Blech. I'll take a look and see what I can figure out without resorting to that.

obs-phrase      =       word *(word / ("." [CFWS]))

OK, but that is not "obsolete". It is intended as an extension to be allowed sometime in the future on a "MUST accept, SHOULD NOT generate yet" basis. So please can we rename it as 'extended-phrase' (which is what I have currently put in Usefor).

I am not convinced this is worth it. It's explained perfectly well in the text.

Currently, RFC2822 requires:

1. Return-Path
2. 1*Received
3. *Resent-xxx
4. Other headers

No, it doesn't. Look at the parens and the repeats. It requires:

*(*(return-path 1*(received)) *(resent-xxx))

followed by other headers.

Yes, it is a good idea that tracing headers be added at the top, so you can tell the order in which the message passed through various agents, but there are some useful cases which have been excluded, for example:

Received: from D by E
Received: from C by D
Resent-To: bar(_at_)E
Resent-From: foo(_at_)C
Received: from B by C
Received: from A by B

That's legal in 2822.

Here is another example (a real one this time, which some readers of may recognize):

Received:  from (localhost [])
 by (8.11.7+Sun/8.11.7) with ESMTP id i05HCjF01021
 for <chl(_at_)clerew(_dot_)man(_dot_)ac(_dot_)uk>; Mon, 5 Jan 2004 17:12:45 GMT
Delivered-To:   postmaster(_at_)A
Received:  (qmail 81124 invoked by uid 800); 5 Jan 2004 12:54:22 -0000

You're right, that's not allowed, and I think that is a bug that needs to be fixed.

Now there are all sorts of perfectly genuine "tracing headers" in there, all added in transit, and all useful.

So, likely we need optional-field to appear in trace. I think that's the logical answer.

Pete Resnick <>
QUALCOMM Incorporated - Direct phone: (858)651-4478, Fax: (858)651-1102