[Top] [All Lists]

Re: Malformed header - what would you do?

2005-07-21 18:47:19

On Thu July 21 2005 18:20, Frank Ellermann wrote:

Bruce Lilly wrote:

unstructured    =       *([FWS] utext) [FWS]

The trailing [FWS] in the definition of unstructured provides
for trailing whitespace on such a single-line unstructured

Sure, we're talking about _replacing_ this one [FWS] here (= in
a future 2822bis) and all similar USEFOR [FWS] by *WSP, because
[1*WSP] is the same as *WSP.

That alone would break the parse grammar, which needs to accommodate
The same exercise for the obs-FWS = 1*WSP *(CRLF 1*WSP) case
has the same result:  *WSP CRLF is what we really want if we
don't like any CRLF 1*WSP CRLF at the end of an <unstructured>.

But that *is* what is desired for obs-FWS (see prose in 4.2).
This was discussed more than a year ago:

Very vaguely I recall this, but I probably didn't look at the
details of your <unstructured> at this time.  What you have is:

| unstructured    =       [[FWS] *(1*utext FWS) 1*utext] *WSP
| unstructured    =       [1*utext] *(FWS 1*utext) *WSP

(skipping the two "uew" incarnations adressing encoded words)

Obviously we agree on ... *WSP replacing the trailing [FWS].
And you guarantee that any FWS is followed by 1*utext, good, I
missed that part.

do you believe there is a problem with the "end" rule and its
use in the revised grammar proposed?

All I can say without digging deep into your idea:  I know that
it's in theory possible to fix "CFWS-separated references" into
"FWS-separated references allowing comments".  So it should be
possible to get rid of all unwanted cases of CRLF 1*WSP CRLF

And apparently you've avoided the [CFWS] [CFWS] problem in your
version of the references, by msg-id = [CFWS] stuff  instead of
msg-id = [CFWS] stuff [CFWS]  moving the critical right [CFWS]
to the end of <references> or <message-id> in the form of <end>

And end = [[FWS] comment] *WSP avoids the CRLF 1*WSP CRLF trap.

Hm, I miss the case of more than one <comment> near the <end>,
how about end = *( [FWS] comment ) *WSP ?

Good point; that will be in the next revision.

A conforming generator is still bound by the normative
provisions of section 3, including 3.2.3.

That talks only about "CFWS"..."MUST NOT", but the problem is
the FWS in <unstructured>.  If you'd want to fix it in prose:
s/where CFWS occurs/where CFWS or FWS occurs/ in 3.2.3

Not really, because every instance of FWS is necessarily CFWS
(but not the converse):

CFWS            =       *([FWS] comment) (([FWS] comment) / FWS)

(zero repetitions of the first group, and taking the last alternative
of the second group yields FWS. Q.E.D.)

If you read CFWS as "comments and/or folding whitespace" there's
no problem.
And in fact is what the section 3.2.3 restriction does.

Sorry, that's what it _should_ do, but it's only a near miss.

No, because FWS is subsumed by CFWS (see above).
That's why a revised grammar was produced and discussed here.

Fine, how about a more readable version without nroff, and for
starters without the "uew" etc. encoded words ?

The nroff facilitates checking and formatting (abnff), and permits
comments which are not part of the ABNF (including ABNF comments).
Use abnff or deroff if you don't like it.  The encoded-word grammar
is a practical necessity for a message parser.  You can of course ignore
it if you don't care about MIME conformance (egrep -v "ew|WSWS|WSCF"),
as non-MIME alternatives are provided.  The encoded-word rules are
somewhat complex; having gone through the trouble to figure out the ABNF,
I'm not going to throw that work away.  Whether it is in the 2822
successor or in a subsequent MIME revision ("updates WXYZ", where WXYZ
is the 2822 successor) is academic at this point.

Note especially the second line of the "To:" field.  It
starts with two space characters.  (Note that "__" represent
blank spaces.)

Sure, what do you think why I mentioned it ?  But "To:" has no
[FWS] problem, it's covered by the "CFWS"..."MUST NOT".  Only
<unstructured> has this problem and there's no example with a
"Subject:" and "__" in the appendix.

To    : Mary Smith

contains folding whitespace but no comments.  It's covered by 3.2.3
because FWS is CFWS.

Besides you'd shoot me if I'd claim that an example is in any
way "normative" without explicitly saying so.

I abhor unnecessary violence.
the 3.2.3 rule applies to all fields.

To all fields with CFWS, e,g. "To:", but not "X-SPAM-foobar:".

You're going out on a limb because you don't know what the syntax
definition of X-Spam-Foobar is.  Anyway, a field with FWS has CFWS
(but not necessarily comments).

If RFC 2822 is used as a normative reference (possibly with
implementation notes where appropriate), there is no problem,
as the 2822 rules, including 3.2.3, apply.

Argh... yes, for header fields ending with [CFWS] CRLF.  But we
have about ten NetNews header fields ending with [FWS] CRLF.

So add a note affirming that 2822 3.2.3 applies.  Or don't.  The
2822 successor will probably be published before the USEFOR documents
are in a state where there's any hope of passing IESG muster, so the
revised grammar can be used, eliminating any chance of ambiguity.
Unlike 2822 USEFOR has at least a prose MUST covering [FWS].
But saying [FWS] when the only allowed case is *WSP is bogus.

You're forgetting the parse grammar again.