Re: Format=Flowed/RFC 2646 Bis (-02)


Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> wrote:

if the format= parameter has a value of "flowed" the sequence SP CR LF
is not treated as a "line break".

3. A "line" consists of zero or more characters which start
immediately at the beginning of the canonical form of the body part,
or immediately following a line break.

4. "Lines" in body parts for which format=flowed MAY be "wrapped" as
necessary to fit the width of the display or output medium,


Your description misses one subtlety: while one or more flowed lines and
a following fixed line constitute a unit that invites re-wrapping, a
fixed line not preceeded by a flowed line does not invite re-wrapping.
Also, I think "paragraph" is a more intuitive term for the concept
you're calling "line".  It's useful to let "line" continue to mean what
it always has meant: a sequence of characters terminated by CR LF.

You raise a very good point about the charset.  The grammar in section 7
allows only ASCII characters, even though the delsp parameter is being
introduced for the express purpose of better supporting non-ASCII
scripts.  I think that can easily be fixed by redoing the non-sp
production:

non-sp = <any character except NUL, CR, LF, SP>

There are some other issues with the grammar.  First, I don't see the
reason for distinguishing between unquoted lines and quoted lines.
Every line has a quote depth, which might happen to be zero, but lines
with zero quote depth are not treated specially.  The grammar currently
says:

flowed-line   = flow-qt / flow-unqt
flow-qt       = quote [stuffing] *text-char 1*SP CRLF
flow-unqt     = [stuffing] *text-char 1*SP CRLF
quote         = 1*">"

Why not simply:

flowed-line   = quote [stuffing] *text-char 1*SP CRLF
quote         = *">"

In any case, the grammar for flowed-line is ambiguous: the spaces before
the last space could match either 1*SP or *text-char.  It's only the
last space that's special; the others are arbitrary text that ought to
match *text-char.  So the production should be:

flowed-line   = quote [stuffing] *text-char SP CRLF

(Similarly, section 5.2 makes the rule sound more complex than it really
is when it says "If the line ends in one or more spaces, the line is
flowed."  The rule is really just "If the line ends in a space, the line
is flowed.")

Even after that last adjustment, the production is still ambiguous.  A
space at the beginning of a line could match [stuffing] or *text-char.
A line that does not begining with the space could still match stuffing,
because stuffing is defined as [SP].  The decoder needs to be able to
distinguish between stuffed lines and unstuffed lines, because it's
supposed to display them differently.  Also, an initial greater-than
sign could match quote or *text-char.  The decoder needs to determine
the quote depth unambiguously.  This will do the trick:

flowed-line   = quote (stuffing stuffed / unstuffed) SP CRLF
stuffed       = *text-char
unstuffed     = non-sp-quote *text-char
quote         = *">"
stuffing      = SP
non-sp-quote  = <any character except NUL, CR, LF, SP, ">">

I think that is unambiguous.  The productions for fixed-line have
similar issues.

The grammar gives a name to the space at the beginning of a line because
it is in some sense not part of the regular text.  Now that the delsp
parameter is introduced, there should be a name for the space at the end
of a line, because it's not part of the regular text when delsp=yes.
And there should be a name for the regular text itself.

Here's a stab at an unambiguous grammar with a name for every noteworthy
syntactic unit:

flowed-body      = * ( paragraph / fixed-line )
paragraph        = 1*flowed-line fixed-line
flowed-line      = quote (stuffing stuffed-flowed / unstuffed-flowed) soft CRLF
fixed-line       = quote (stuffing stuffed-fixed / unstuffed-fixed) CRLF
stuffed-flowed   = *text-char
unstuffed-flowed = non-sp-quote *text-char
stuffed-fixed    = [*text-char non-sp]
unstuffed-fixed  = non-sp-quote [*text-char non-sp]
quote            = *">"
stuffing         = SP
soft             = SP
non-sp-quote     = <any character except NUL, CR, LF, SP, ">">
non-sp           = non-sp-quote / ">"
text-char        = non-sp / SP

There is an additional rule that is impossible to express in the
grammar: a flowed line must have the same quote depth as the next line.
A flowed line that breaks this rule (has a quote depth different from
the next line) is to be intepreted as if it were a fixed line.

AMC