ietf-822
[Top] [All Lists]

Re: Format=Flowed/RFC 2646 Bis (-03)

2003-11-18 16:38:35

At 5:47 AM +0000 11/17/03, Adam M. Costello wrote:

 Two of the steps I listed need to be swapped.  Step 3 checks for the
 sig line, and step 4 unstuffs.  I wrote them in that order because
 section 5.3 says "an (optionally quoted) line consisting of DASH DASH
 SP is not considered flowed."  But now I notice that the grammar
 says sig-sep = [quote [stuffing]] "--" SP CRLF.  To be consistent,
 section 5.3 should say "(optionally quoted, optionally stuffed)", and
 the interpreting section should not check for a sig line until after
 unstuffing.

 (Or do you want to resolve the inconsistency the other way, by changing
 the grammar?)

The -03 version fixed the grammar to treat signature separator lines as a third type of line. I just created an -04 which changes "an (optionally quoted) line consisting of DASH DASH SP" to "an (optionally quoted or quoted and stuffed) line consisting of DASH DASH SP".

Note that the idea was to allow signature separator lines to be quoted, but not stuffed unless also quoted. That way, stuffing can be used to guard against a line being confused with a signature separator.


  > Adam also had a number of concerns over ambiguity in the grammar, with
 suggestions for improvement.  I generally like the replacement text,
 except for the removal of the distinction between quoted and unquoted
 lines.  I thought it was helpful to identify a non-quoted line on
 its own, and not just as a line with a quote-depth of zero.  This
 is based on a perceived need to treat the two somewhat differently,
 in particular, quoted lines need extra handling, state, and display
 semantics.

 I would think it would be easier for an implementor to write a single
 handler & data structure for all paragraphs of all quote depths, rather
 than make quote-depth-zero a special case.

Making the distinction in the grammar still allows a client to implement either way.


  Of course it's possible to
 expand the grammar to make the distinction, but I think it makes the
 grammar appear more complex than it really is.

This is a good point. The grammar in the -03 and -04 versions is based on the suggested replacement you provided, plus fixes to treat signature lines as a third type, and to distinguish between quoted and unquoted. I agree that this latter change does make the grammar larger and more complex.


  (And I just noticed that
 while the grammar in the current draft distinguishes between quoted
 flowed and unquoted flowed, it does not distinguish between quoted fixed
 and unquoted fixed.)

This was fixed in -03.


 By the way, my proposed grammar forgot to handle the Usenet sig
 exception.

I fixed this in -03.


   Here's a fixed version that also incorporates my three
 suggestions regarding improperly terminated paragraphs:

 flowed-body       = * ( paragraph / fixed-line / sig-line )
 paragraph         = 1*flowed-line fixed-line
                     ; That is the grammar for proper paragraphs, which
                     ; always end with a fixed line.  Improper paragraphs
                     ; are instead terminated by a change in quote-depth,
                     ; end of input, or a sig-line (which is not included
                     ; in the paragraph).
 sig-line          = quote [stuff] "--" SP CRLF
 fixed-line        = quote (stuff stuffed-fixed / unstuffed-fixed) CRLF
 flowed-line       = quote (stuff stuffed-flowed / unstuffed-flowed) flow CRLF
 stuffed-fixed     = [*text-char non-sp]
                     ; Does not end with SP.
 unstuffed-fixed   = non-sp-quote [*text-char non-sp]
                     ; Does not begin with SP or ">", does not end with SP.
 stuffed-flowed    = [non-dash *text-char] /
                     "-" [non-dash *text-char / "-" 1*text-char]
                     ; Is not "--".
 unstuffed-flowed  = non-sp-quote-dash *text-char /
                     "-" [non-dash *text-char / "-" 1*text-char]
                     ; Not empty, not "--", does not begin with SP or ">".
 quote             = *">"
 stuff             = SP
 flow              = SP
 non-sp-quote-dash = <any character except NUL, CR, LF, SP, ">", "-">
 non-sp-quote      = <any character except NUL, CR, LF, SP, ">">
 non-sp            = <any character except NUL, CR, LF, SP>
 text-char         = <any character except NUL, CR, LF>
 non-dash          = <any character except NUL, CR, LF, "-">

That definition for sig-line allows it to be stuffed but not quoted, which we have been prohibiting.

The suggested definitions for flowed lines attempt to eliminate the ambiguity between a flowed line and a signature separator but I don't think they allow for all cases of flowed lines. I think we'd need something like:
        non-sp-quote-dash *text-char /
        "-" non-dash *text-char /
        "--" non-space *text-char /
        "--" 2*text-char

But this wouldn't work since the definition of flowed-line includes the flow (space) at the end. I don't see any way to resolve the ambiguity without really convoluting the ABNF. I think it may be OK to just note the potential ambiguity, especially since a check for signature line can be made before checking for a flowed line. I've added a note in the ABNF section of -04 about it.


 One of my suggested grammar tweaks was that a sig line should not get
 sucked into a paragraph, even if it is preceeded by a flowed line,
 because then it could get re-wrapped and no longer appear at the start
 of a line (and therefore cease to be a sig line).  I notice now that
 this suggestion amounts to having a third type of line.  A sig line is
 neither fixed nor flowed, because fixed and flowed lines can be inside
 paragraphs, while sig lines can never be inside paragraphs.

I agree, and I believe this was fixed in -03.


 Another comment regarding the grammar:  It is nice for a grammar to give
 names to the meaningful syntactic constructs.  For example, we'd like
 a name for the quote-marks (and we have one), we'd like names for the
 special spaces that act as flags (and we have them), and we'd like a
 name for the actual content of the line without the quotes and flags,
 but the grammar in the draft doesn't give us that.  Consider for example
 stuffed-flowed.  In the draft, this means a line that *was* flowed and
 *is* stuffed (it includes the stuff space but not the flow space).  In
 the grammar in the old message above, stuffed-flowed means a line that
 *was* flowed and *was* stuffed (it includes neither the stuff space nor
 the flow space, only the actual content).

This is a nice feature, and I believe I've achieved it in -04 (at the expense of some extra parentheses).


 Section 5.1 says:

     If the line ends in a space, the line is flowed.  Otherwise it is
     fixed.  The exception to this rule is a signature separator line,
     described in Section 5.3.  Such lines end in a space but are not
     flowed.

 That leaves the following question unanswered:  Are separator lines
 fixed, or are they a third type of line?

 According to the grammar in section 7, signature separator lines do not
 match fixed-line.  That seems to suggest that they are a third type of
 line, which is the view that seems most intuitive to me.  That could be
 clarified by changing the last sentence of the quoted paragraph to "Such
 lines end in a space but are neither flowed nor fixed."

I agree and made this change in -04.


 Section 5.3 says:

     This is a special case; an (optionally quoted) line consisting of
     DASH DASH SP is not considered flowed.

 Sections 5.1 (interpreting) and 7 (grammar) both indicate that a
 signature line can be quoted and/or stuffed.  It is confusing for 5.3
 to mention "optionally quoted" without also mentioning "optionally
 stuffed".  Also, if in section 5.1 "not flowed" is changed to "neither
 flowed nor fixed", the same change ought to be made here.

Thanks; I believe this has been cleaned up in -04 so that it is now clear from all references in the text as well as the grammar that signature lines can be quoted or quoted and stuffed but they can't be stuffed without being quoted.


 Section 5.3 goes on to say:

     Generating agents MUST NOT end a paragraph with such a signature
     line, since doing so would indicate that the separator line is part
     of the paragraph.

 It would not indicate that the separator line is part of the paragraph,
 it would indicate that the body is malformed (according to the grammar
 and according to section 5.1); the receiver would not believe that the
 separator is part of the paragraph (according to 5.1).

That jumped out at me as I was making another change. I deleted the second clause, so now it just says "Generating agents MUST NOT end a paragraph with such a signature line".


 Perhaps the intention is something like this:

     When placing soft line breaks in a paragraph, generating agents MUST
     NOT place them in a way that causes any line of the paragraph to
     be a signature separator line, because paragraphs cannot contain
     signature separator lines (see sections 5.1 and 7).

I'm not sure if that was the original intent or not, but I liked the text you suggest and so added it to the section on generating f=f (with references to the section on signature lines and on the abnf).


 Section 5.4 says:

     Space-stuffing adds a single space to the start of any line which
     needs protection when the message is generated.  On reception, if
     the first character of a line is a space, it is logically deleted.
     This occurs after the test for a quoted line, and before the test
     for a flowed line.

 It's not only after testing for a quoted line, but more importantly
 after stripping the quoting.

But the test for quoted line is what deletes the quote marks. I changed the text in -04 to say "This occurs after the test for a quoted line (which logically counts and deletes any quote marks)".


   And it's not only before the test for
 a flowed line, but also before the test for a separator line.

I think the test for a signature line has to happen both before the test for a quoted line and also after deleting quote marks and stuffing.


 Section 5.5 says:

     When generating quoted flowed lines, an agent needs to pay attention
     to changes in quote depth.  A sequence of quoted lines of the same
     quote depth immediately followed by lines of a different quote
     depth MUST be encoded so that lines of the same quote depth are a
     paragraph, with the last line generated as fixed and prior lines
     generated as flowed.

 That seems to be a much stronger requirement than you intend.  Within a
 single quote depth, there might be multiple paragraphs, non-paragraph
 fixed-lines, and separator lines.  But the sentence quoted above seems
 to say that because all of this text is a bunch of "lines at the same
 quote depth", it must be encoded as "a paragraph", with the last line
 fixed and all other lines flowed.  Perhaps the intention is something
 like this:

     When generating quoted flowed lines, an agent needs to pay attention
     to changes in quote depth.  All lines of a paragraph MUST be
     unquoted, or else they MUST all be quoted and have the same quote
     depth.  Therefore, whenever there is a change in quote depth, or a
     change from quoted to unquoted, or change from unquoted to quoted,
     the line immediately preceeding the change MUST NOT be a flowed
     line.

Indeed.  Thanks for catching this.


 Section 5.5 goes on to say:

     If a receiving agent wishes to reformat flowed quoted lines (joining
     and/or wrapping them) on display or when generating new messages,
     the lines SHOULD be de-quoted, reformatted, and then re-quoted.  To
     de-quote, the number of close angle brackets in the quote indicator
     at the start of each line is counted.  Consecutive lines with the
     same quote depth are considered one paragraph and are reformatted
     together.  To re-quote after reformatting, a quote indicator
     containing the same number of close angle brackets originally
     present are prefixed to each line.

 I think one sentence there is inaccurate: "Consecutive lines with the
 same quote depth are considered one paragraph and are reformatted
 together."  Consecutive lines with the same quote depth could be one
 paragraph or several paragraphs or non-paragraph fixed lines (in which
 case no reformatting is requested) or separator lines.  I think that
 sentence can simply be removed.  Reformatting is covered elsewhere; this
 section is about quoting.

Another good catch.  The sentence is deleted in -04.


 The next two paragraphs are inconsistent with section 5.1:

     On reception, if a change in quote depth occurs on a flowed line,
     this is an improperly formatted message.  The receiver SHOULD handle
     this error by using the 'quote-depth-wins' rule, which is to ignore
     the flowed indicator and treat the line as fixed.  That is, the
     change in quote depth ends the paragraph.

     In other words, whenever two adjacent lines have different quote
     depths, senders MUST ensure that the earlier line is fixed (does
     not end in a space), and receivers SHOULD treat the earlier line as
     fixed regardless of whether it ends with a space.

 According to section 5.1, the paragraph ends with the flowed line; it is
 possible therefore to have an improperly terminated paragraph consisting
 of a single flowed line, and such a paragraph would be reformatted.  If
 the flowed indicator is ignored and the line is treated as fixed, then
 we have a single fixed line, which is not a paragraph at all and would
 not be reformatted.  Also, it is possible for the line before the change
 in quote depth to be a separator line, which is arguably not fixed (see
 the discussion above).  The inconsistency could be resolved like so:

     ...the 'quote-depth-wins rule', which is to consider the paragraph
     to end with the flowed line immediately preceeding the change in
     quote depth.

     In other words, whenever two adjacent lines have different quote
     depths, senders MUST ensure that the earlier line is not flowed
     (does not end in a space), and receivers finding a flowed line there
     SHOULD treat it as the last line of a paragraph.

I agree; thanks again.


 Here we have more instances of the phrase "change in quote depth".  If
 we keep the current view that unquoted lines have no quote depth and
 quoted lines have non-zero quote depth, then we really ought to be
 saying "change in quote depth, or change from quoted to unquoted, or
 change from unquoted to quoted".  If we adopt the view that all lines
 have a quote depth, which can be zero, then the simple phrase "change in
 quote depth" will mean what we want it to mean.

I think "change in quote depth" can include a changing between quoted and unquoted. Even though I have retained the ABNF distinction between quoted and unquoted, I don't think the text has to be too rigid about it.

--
Randall Gellens
Opinions are personal;    facts are suspect;    I speak for myself only
-------------- Randomly-selected tag: ---------------
Nothing astonishes men so much as common sense and plain dealing.
                                          --Ralph Waldo Emerson.