[Top] [All Lists]

Re: Malformed header - what would you do?

2005-07-21 07:40:03

On Thu July 21 2005 05:02, Frank Ellermann wrote:

Bruce Lilly wrote:

Where do you see this in 2822 ?

| 2.2. Header Fields
| A field body may be composed of any US-ASCII characters,
| except for CR and LF.  However, a field body may contain
| CRLF when used in header "folding" and  "unfolding" as
| described in section 2.2.3.  All field bodies MUST conform
| to the syntax described in sections 3 and 4

That doesn't talk about the trailing [FWS] in <unstructured>.

Why do you think it should? (hint: the FWS rule also includes
obs-FWS for parsing)

FWS             =       ([*WSP CRLF] 1*WSP) /   ; Folding white space

Note also that it is perfectly valid for a single-line unstructured
field to end in one or more whitespace characters, which is what the
1*WSP in the definition of FWS allows.

unstructured    =       *([FWS] utext) [FWS]

The trailing [FWS] in the definition of unstructured provides for
trailing whitespace on such a single-line unstructured field.

This was discussed more than a year ago:
(oddly absent from the IMC web-based archive)
Do you believe that there is something new to be said w.r.t. revising
the 2822 ABNF?  In particular, do you believe there is a problem with
the "end" rule and its use in the revised grammar proposed?
It's also not straight forward with the CR and LF as found in:

| obs-qp          =       "\" (%d0-127)
| obs-text        =       *LF *CR *(obs-char *LF *CR)

2822 obs- rules are for parsing; the discussion here is about a
field which was recently generated.
| 3.6.8. Optional fields
| They MUST conform to the syntax of an optional-field.
| This is a field name, made up of the printable US-ASCII
| characters except SP and colon, followed by a colon,
| followed by any text which conforms to unstructured.

That allows the trailing [FWS] in the X-SPAM header field (or
whatever its name was) reported by the OP.

For parsing, of course, since 822 and its predecessors did not have
a rule against whitespace-only continuation lines (in turn because a
certain company's "programmers" hadn't been around to botch parsing).
A conforming generator is still bound by the normative provisions of
section 3, including 3.2.3.
| Appendix B:
| 13. Folding continuation lines cannot contain only white
| space.*

That's a list of _claimed_ differences from STD 11:

And in fact is what the section 3.2.3 restriction does.
| Items marked with an asterisk (*) below are items which
| appear in section 4 of this document and therefore can no
| longer be generated.

Therefore point 13 is talking about obs-FWS, not trailing FWS:

Sort of.  It says that while whitespace-only continuation lines
used to be allowed, and conforming parsers must still handle them
properly, they "can no longer be generated" by conforming
| 4.2. Obsolete folding white space
| In the obsolete syntax, any amount of folding white space MAY
| be inserted where the obs-FWS rule is allowed.  This creates
| the possibility of having two consecutive "folds" in a line,
| and therefore the possibility that a line which makes up a
| folded header field could be composed entirely of white
| space.
|   obs-FWS         =       1*WSP *(CRLF 1*WSP)

The trailing [FWS] in <unstructured> is erroneous, in RfC 2822

No, FWS includes obs-FWS and is necessary for parsing.  Now I agree
that it's not an ideal way to indicate parse vs. generate grammars,
and that some things which could be specified in ABNF aren't.  That's
why a revised grammar was produced and discussed here.

and in its numerous USEFOR children, starting with the USEFOR
incarnation of <unstructured>.  Apparently RFC 2822 didn't want
it this way, but it forgot to say so explicitly.

It sounds like you;re confusing the parse grammar and the generate
grammar.  Section 3 (including 3.2.3) applies to generation.

And we're _copying_ this bug to USEFOR, I can't believe it. :-(

There's no bug (at least not in that area).
The Appendix B text clearly identifies the problem which is
the subject of the current discussion.

It addressses only obs-FWS as indicated by the (*).

The reason it's labeled obs- is precisely because it is no longer
permitted for generation (but is required to be handled by parsers).

A.6.3 illustrates the [CFWS] rule, the trailing [FWS] somehow
escaped in the wild - hard to find it, obs-FWS hides it almost

   Note especially the second line of the "To:" field.  It starts with
   two space characters.  (Note that "__" represent blank spaces.)

it only affects <unstructured> in RfC 2822.

No, the 3.2.3 rule applies to all fields.

But in  
USEFOR it hits about ten header fields ending with [FWS] CRLF.

If RFC 2822 is used as a normative reference (possibly with implementation
notes where appropriate), there is no problem, as the 2822 rules, including
3.2.3, apply.  Of course if instead of citing 2822 as a normative reference
and using it as such, the approach of rewriting syntax is taken, then the
issue needs to be addressed anew.  That is precisely why the first paragraph
of section 1.6 of the usefor-usefor draft points out the problems that arise
when cite-by-reference is not used.
The 2822 grammar as written does not correspond to the prose
specifying no continuation lines containing only whitespace.

The prose is misleading.  We are only guessing the intention
based on an unrelated example, an unrelated point in the list
of differences, and a related MUST NOT valid only for the CFWS.

No, we know what the intent was, as it has been discussed; at minimum:
No big issue for RfC 2822, it only affects X-header fields and
other unstructured header fields, but for USEFOR it's far more

No.  Believe it or not, the To field is not unstructured and does not begin
with "X-".  That *is* the A.6.3 example.
so far nobody supported a s/[FWS]/*WSP/ cleanup in USEFOR.

USEFOR has no jurisdiction over RFC 2822.

Now that it's finally clear that the next step for RfC 2822 
will be historic