ietf-822
[Top] [All Lists]

Re: Format=Flowed/RFC 2646 Bis (-03)

2003-11-24 20:46:06

Randall Gellens <randy(_at_)qualcomm(_dot_)com> wrote:

The grammar in the -03 and -04 versions is based on the suggested
replacement you provided, plus fixes...to distinguish between quoted
and unquoted.  I agree that this latter change does make the grammar
larger and more complex.

Perhaps there is a middle ground.  Here is an excerpt from the -04
grammar:

    flowed-line      = ( flowed-line-qt / flowed-line-unqt ) flow CRLF
    flowed-line-qt   = quote ( ( stuffing stuffed-flowed ) /
                               unstuffed-flowed )
    flowed-line-unqt = ( stuffing stuffed-flowed ) / unstuffed-flowed
    fixed-line       = fixed-line-qt / fixed-line-unqt
    fixed-line-qt    = quote ( ( stuffing stuffed-fixed ) /
                               unstuffed-fixed ) CRLF
    fixed-line-unqt  = ( stuffed-fixed / unstuffed-fixed ) CRLF
    quote            = 1*">"

Here is what I had proposed:

    fixed-line       = quote ( (stuff stuffed-fixed) /
                               unstuffed-fixed         ) CRLF
    flowed-line      = quote ( (stuff stuffed-flowed) /
                               unstuffed-flowed         ) flow CRLF
    quote            = *">"

Here is a tweak to that proposal:

    fixed-line       = [quote] ( (stuff stuffed-fixed) /
                                 unstuffed-fixed         ) CRLF
    flowed-line      = [quote] ( (stuff stuffed-flowed) /
                                 unstuffed-flowed         ) flow CRLF
    quote            = 1*">"

My original proposed grammar presented the view that the quoting is
always present, but might have a depth of zero.  The tweaked grammar
presents a view more in line with the text of the draft: the quoting can
be present or not present, and if it is present it has a non-zero depth.
Unlike the -04 draft, the tweaked proposal doesn't give explicit names
to the quoted/unquoted versions of the lines, it just shows "[quote]"
in brackets, implying that the line is either quoted or not.  Forgoing
the explicit names shaves four rules off the grammar.  Just something to
consider.

I believe this has been cleaned up in -04 so that it is now clear from
all references in the text as well as the grammar that signature lines
can be quoted or quoted and stuffed but they can't be stuffed without
being quoted.

Section 5.1 is the first introduction to the syntax, and it says:

    Logically, this test for quoted lines is done before any other tests
    (that is, before checking for space-stuffed and flowed).

    Logically, this leading space is deleted before examining the line
    further (that is, before checking for flowed).

    If the line ends in a space, the line is flowed.  Otherwise it is
    fixed.  The exception to this rule is a signature separator line,
    described in Section 5.3.  Such lines end in a space but are neither
    flowed nor fixed.

It seems clear from reading 5.1 that the test for signature separator
lines happens along with the test for flowed lines, after the quoting
and space-stuffing have been stripped off.  But that inference is
inconsistent with the grammar.  I was hoping that section 5.1 would
give the complete procedure for analyzing the structure of a flowed
body.  Later sections tell what to do with that structure & why & how,
but I was hoping that 5.1 would contain all the same information as
the grammar (in plain English procedural form rather than a formal
declarative form).

Section 5.3 in -04 says:

    A receiving agent needs to test for a signature line both before the
    test for a quoted line (see Section 5.5) and also after logically
    counting and deleting quote marks and stuffing (see Section 5.4)
    from a quoted line.

If that's true, then I'd like to see those two tests in section 5.1
along with the other tests.  But I'm skeptical of testing for a
signature line after deleting the stuffing.  I don't see how that's
useful, because after the stuffing is deleted, there is no memory of it
(unlike the quote indicators, which are remembered in the quote depth).
If the line is dash-dash-space at this point, it might be a signature
line, but it might not be (if it had been stuffed and not quoted).

To decode the line syntax indicated in the -04 grammar, I think the
actual decoding steps are:

 1. Count & strip quote indicators.
 2. Check for signature separator:
    dash-dash-space is a separator (always),
    space-dash-dash-space is a separator if quote depth is nonzero,
 3. Unstuff: delete leading space if present.
 4. Check for trailing space to determine flowed/fixed (unless the
    line has already been classified as a signature separator).

Another way to achieve the same effect is:

 1. Count & strip quote indicators.
 2. Unstuff: delete leading space if present, but remember that the line
    was stuffed.
 3. If the line was quoted or neither-quoted-nor-stuffed, check for
    signature separator: dash-dash-space.
 4. Check for trailing space to determine flowed/fixed (unless the
    line has already been classified as a signature separator).

I still think it's very counterintuitive that I can use space-stuffing
to hide a signature line in unquoted text, but I can't do the same in
quoted text.  I think the grammar and parsing would be simpler and more
intuitive if the sequence were either:

 1. Count & strip quote indicators.
 2. Unstuff: delete leading space if present.
 3. Check for signature separator: dash-dash-space.
 4. Check for trailing space to determine flowed/fixed (unless the
    line has already been classified as a signature separator).

or:

 1. Count & strip quote indicators.
 2. Check for signature separator: dash-dash-space.
 3. Unstuff: delete leading space if present.
 4. Check for trailing space to determine flowed/fixed (unless the
    line has already been classified as a signature separator).

In other words, stuffing either hides signature separators or it
doesn't, regardless of quoting.

In any case, I'd like to see section 5.1 include all the steps for
decoding a line, whatever they are.

can you produce a concrete example of a line that ought to be flowed
and doesn't match my suggested flowed-line production?

flowed-line       = quote (stuff stuffed-flowed / unstuffed-flowed) flow 
CRLF
stuffed-flowed    = [non-dash *text-char] /
                    "-" [non-dash *text-char / "-" 1*text-char]
                    ; Is not "--".
unstuffed-flowed  = non-sp-quote-dash *text-char /
                    "-" [non-dash *text-char / "-" 1*text-char]
                    ; Not empty, not "--", does not begin with SP or ">".
quote             = *">"
stuff             = SP
flow              = SP

I think it's a good idea to use parentheses to explicitly group the 
ABNF constructs, to avoid confusion.

So, taking your suggested 'unstuffed-flowed' to be

    unstuffed-flowed  = ( non-sp-quote-dash *text-char ) /
                        ( "-" [non-dash *text-char ) /
                        ( "-" 1*text-char] )

Notice the position of your parentheses relative to my square brackets.
Perhaps I should have put spaces around the brackets:

stuffed-flowed    = [ non-dash *text-char ] /
                    "-" [ (non-dash *text-char) / ("-" 1*text-char) ]
                    ; Is not "--".
unstuffed-flowed  = (non-sp-quote-dash *text-char) /
                    "-" [ (non-dash *text-char) / ("-" 1*text-char) ]
                    ; Not empty, not "--", does not begin with SP or ">".

Maybe it would be clearer in a more verbose form without the brackets
and without any nesting:

stuffed-flowed    = "" / "-" /
                    (non-dash *text-char) /
                    ("-" non-dash *text-char) /
                    ("--" 1*text-char)
                    ; Is not "--".
unstuffed-flowed  = "-" /
                    (non-sp-quote-dash *text-char) /
                    ("-" non-dash *text-char) /
                    ("--" 1*text-char)
                    ; Not empty, not "--", does not begin with SP or ">".

Remember that these rules were written under the assumption that
signature separators can be quoted and/or stuffed in any combination.
These rules would need to be adjusted to reflect draft -04 sig-sep
syntax, or to reflect a syntax that never allows stuffed sig-sep lines.
But I have no doubt that we can write an unambiguous grammar for any of
these syntaxes.

AMC