Re: Revisiting RFC 2822 grammar (obs-utext and unstructured)


Charles Lindsey wrote:

Yes, but that is getting a long way from what seems to be the established
convention that *text things consist of just a single character (or
perhaps a single character with some naked CF or LF attached).


?!?  "text" should instantiate a single character, and it does both
in 2822 and in the revised grammar.  "*text" by definition (see RFC
2234) means any number (0 to infinity) of occurrences of "text" and
therefore could not possibly be restricted to a single character.

And I don't think we want two adjacent FWS. Your revised grammar went to
much trouble to avoid adjacent CFWS (or FWS in some cases), and that was
seen as a great improvement. Now they seem to have come back in.


RFC 2822 section 4:

   Another key difference between the obsolete and the current syntax is
   that the rule in section 3.2.3 regarding lines composed entirely of
   white space in comments and folding white space does not apply.  See
   the discussion of folding white space in section 4.2 below.

Permitting multiple adjacent FWS instances in the obs- constructs was
intended to comply with those parsing requirements, but I suppose that's
handled by obs-FWS (which unlike the other productions in 2822, is not
left-justified in the RFC text).

May I suggest you take another look at the grammar I originally suggested.


Blech.  Having a non-optional obs- construct in the definition of a
non-obs- production is at best confusing.

Note that an unstructured field body begins with [FWS], explicitly at
least in the non-obs cases.  Therefore, in
 Subject: foo
the field body is " foo", not "foo", and
 Subject: Re: foo
begins with " Re:", not with "Re:", so the wording of section 3.6.5
should be revised (or the field name/field body delimiter formally
redefined to include any [FWS] or [CFWS} (as the case may be) following
the colon). [And A.2 needs to warn about line length limits, including
those in effect when encoded-words are present; "prepending" is
inadequate as a means of implementation.]



Yes, I regard that as a problem. What people usually mean by a "subject"
starts at the first non-blank character, and a subject consisting of just
FWS would normally be described as "empty". An instruction in some
protocol that "the subject of the reply SHOULD be the same as the subject
of the original" should not extend to having exactly the same number or
initial SPs. So that needs thinking about.


The issue only arises when one discusses modifying the start of a field
body, which only happens in the "Re:" discussion.  Now,
   Subject:Re:  foo
is fine per RFC 2822 rules, but it's not going to make you happy when you're
wearing your Usefor hat.

I still think it's best to simply leave Subject defined as unstructured,
with no comment about "Re:" (since, as you pointed out, that is already
allowed by "unstructured"), just as RFC 822 did.  Anything else inevitably
imposes structure on Subject.  For example if one starts with
   Subject:          foo
what should an implementor do to add "Re:":
   Subject:Re:           foo
   Subject:          Re: foo
   Subject: Re:         foo
etc.?  Clearly an implementor who chooses to support adding "Re:" has to
choose _some_ method for doing so in such a circumstance, and that means
assuming some specific type of structure.  If the document specifies some
structure, then it's silly to maintain that Subject is unstructured.  If
the document is vague about the matter, that opens the possibility that
implementor B will accuse implementor A of being inadequately conservative
in what he generates, or that implementor A will accuse implementor B of
being insufficiently liberal in his interpretation of what he receives, or
(most likely) both. And that's not limited to where "Re:" goes w.r.t.
whitespace, but also "Re" vs. "re" vs. "RE" vs. "rE" vs.
"=?us-ascii*la?q?Re?=" ...