[Top] [All Lists]

Re: Malformed header - what would you do?

2005-07-21 15:40:52

Bruce Lilly wrote:

it is perfectly valid for a single-line unstructured field to
end in one or more whitespace characters, which is what the
1*WSP in the definition of FWS allows.

unstructured    =       *([FWS] utext) [FWS]

The trailing [FWS] in the definition of unstructured provides
for trailing whitespace on such a single-line unstructured

Sure, we're talking about _replacing_ this one [FWS] here (= in
a future 2822bis) and all similar USEFOR [FWS] by *WSP, because
[1*WSP] is the same as *WSP.

  FWS = ([*WSP CRLF] 1*WSP) /   ; Folding white space

In [FWS] CRLF we don't want the *WSP CRLF 1*WSP CRLF case, in
other words we don't want the [*WSP CRLF] part of the FWS, that
leaves us with 1*WSP instead of FWS, putting it all together it
is [1*WSP] CRLF, or in canonical form *WSP CRLF.

The same exercise for the obs-FWS = 1*WSP *(CRLF 1*WSP) case
has the same result:  *WSP CRLF is what we really want if we
don't like any CRLF 1*WSP CRLF at the end of an <unstructured>.

Including the X-SPAM-example or similar USEFOR header fields.

This was discussed more than a year ago:

Very vaguely I recall this, but I probably didn't look at the
details of your <unstructured> at this time.  What you have is:

| unstructured    =       [[FWS] *(1*utext FWS) 1*utext] *WSP
| unstructured    =       [1*utext] *(FWS 1*utext) *WSP

(skipping the two "uew" incarnations adressing encoded words)

Obviously we agree on ... *WSP replacing the trailing [FWS].
And you guarantee that any FWS is followed by 1*utext, good, I
missed that part.

do you believe there is a problem with the "end" rule and its
use in the revised grammar proposed?

All I can say without digging deep into your idea:  I know that
it's in theory possible to fix "CFWS-separated references" into
"FWS-separated references allowing comments".  So it should be
possible to get rid of all unwanted cases of CRLF 1*WSP CRLF

And apparently you've avoided the [CFWS] [CFWS] problem in your
version of the references, by msg-id = [CFWS] stuff  instead of
msg-id = [CFWS] stuff [CFWS]  moving the critical right [CFWS]
to the end of <references> or <message-id> in the form of <end>

And end = [[FWS] comment] *WSP avoids the CRLF 1*WSP CRLF trap.

Hm, I miss the case of more than one <comment> near the <end>,
how about end = *( [FWS] comment ) *WSP ?  Example:

Message-ID: <402FA041(_dot_)6040808(_at_)verizon(_dot_)net> (one) (two)

A conforming generator is still bound by the normative
provisions of section 3, including 3.2.3.

That talks only about "CFWS"..."MUST NOT", but the problem is
the FWS in <unstructured>.  If you'd want to fix it in prose:
s/where CFWS occurs/where CFWS or FWS occurs/ in 3.2.3

Actually we're not worried about completely empty comments, so
s/where CFWS occurs/where FWS occurs (directly or indirectly)/
in 3.2.3 would also fix it.

And in fact is what the section 3.2.3 restriction does.

Sorry, that's what it _should_ do, but it's only a near miss.

FWS includes obs-FWS and is necessary for parsing.

Is that something your grammar does ?  Stupid question, yes,
you have a separate <obs-unstructured> which can end with an
<obs-FWS>.  Nice.

That's why a revised grammar was produced and discussed here.

Fine, how about a more readable version without nroff, and for
starters without the "uew" etc. encoded words ?

Note especially the second line of the "To:" field.  It
starts with two space characters.  (Note that "__" represent
blank spaces.)

Sure, what do you think why I mentioned it ?  But "To:" has no
[FWS] problem, it's covered by the "CFWS"..."MUST NOT".  Only
<unstructured> has this problem and there's no example with a
"Subject:" and "__" in the appendix.

Besides you'd shoot me if I'd claim that an example is in any
way "normative" without explicitly saying so.

the 3.2.3 rule applies to all fields.

To all fields with CFWS, e,g. "To:", but not "X-SPAM-foobar:".

If RFC 2822 is used as a normative reference (possibly with
implementation notes where appropriate), there is no problem,
as the 2822 rules, including 3.2.3, apply.

Argh... yes, for header fields ending with [CFWS] CRLF.  But we
have about ten NetNews header fields ending with [FWS] CRLF.

And we have the same ten starting with "name:" SP [FWS], which
should be be "name:" SP *WSP to avoid a CRLF before the first
no-WSP character of the header field body.

Unlike 2822 USEFOR has at least a prose MUST covering [FWS].
But saying [FWS] when the only allowed case is *WSP is bogus.

we know what the intent was, as it has been discussed; at

I didn't read this list in May 2003.  Or any other IETF list.

The 2822-author talks about <mailbox-list> => structured =>
CFWS => covered by the MUST NOT in 3.2.3, nobody doubted that.

Now that it's finally clear that the next step for RfC 2822
will be historic


Okay, that was a bit harsh for _one_ s/[FWS]/*WSP/.  OTOH you
showed that it would kill the trailing obs-FWS, which is wrong.
A minimal erratum could be to fix the 3.2.3 prose (see above).

                        Bye, Frank