[Top] [All Lists]

Re: Revisiting RFC 2822 grammar (quoted-pair)

2004-01-18 00:37:23

Bruce Lilly <blilly(_at_)verizon(_dot_)net> wrote:

Moreover, the semantics of \CRLF as given in RFC 822 are quite
different from \CR followed by a lone LF.

I'm having trouble making sense of \CRLF in RFC 822.

Consider the following example:

From: "foo\
 bar" <blah(_at_)example>

This could not have been created by starting with a one-line field
and then folding it, because 3.1.1 says that folding may happen
"wherever there may be linear-white-space (NOT simply LWSP-chars)".
Linear-white-space is allowed in qtext, but not in quoted-pair, so there
would be no way for the folding process to wedge a CRLF between the
backslash and the next CHAR.

The only way to parse this field, according to the grammar, is as:

"From" ":" <"> CHAR CHAR CHAR quoted-pair CHAR CHAR CHAR CHAR CHAR <"> ...
 From   :   "   f    o    o       \CR      LF   SP   b    a    r    "  ...

There is no way to parse the field in a way that involves the
linear-white-space token, and no way to parse it in a way that involves
the CRLF token.

Question:  Can this field be unfolded?

3.1.1 says "Unfolding is accomplished by regarding CRLF immediately
followed by a LWSP-char as equivalent to the LWSP-char."  But the CRLF
token is not present in the parsing of the field, so we cannot unfold

On the other hand, 3.1.2 says "The field-body may be composed of any
ASCII characters, except CR or LF.  (While CR and/or LF may be present
in the actual text, they are removed by the action of unfolding the
field.)"  Therefore, we ought to be able to remove the CR and the LF by
unfolding the field.

On the other hand, the grammar clearly allows bare LF, implicitly in
qtext/dtext/ctext and explicitly in text, contradicting the previous
statement from 3.1.2.

On the other hand, 3.4.8 says "Each header field may be represented
on exactly one line consisting of the name of the field and its body,
and terminated by a CRLF; this is what the parser sees."  Therefore,
we ought to be able to remove the internal line break by unfolding the

On the other hand, 3.4.5 says "the presence of the quoting character
(backslash) explicitly indicates that the CRLF is data to the quoted
string."  How can the line break be data to the quoted string if the
parser doesn't even see the line break as stated in 3.4.8?

By the way, 3.4.3 says that \CRLF within a comment "must be followed
by at least one LWSP-char."  The same statement is not made regarding
quoted-strings, but could be inferred.  But nothing in the grammar
enforces this.  According to the grammar, this is a valid field:

From: "foo\
bar" <blah(_at_)example> (hi\

This all seems like a big mess that should be deprecated.  And indeed,
in RFC 2822, lone CR, lone LF, and \CR are all relegated to obsolete
syntax.  I'm thinking that was a good move.

As for the obsolete grammar, parsing \CRLF as \CR followed by LF is
consistent with the 822 grammar, even if it doesn't seem to jibe with
the phrase "quoted CRLF" in the prose.