ietf-822
[Top] [All Lists]

Re: Format=Flowed/RFC 2646 Bis (-03)

2003-11-16 22:47:28

Randall Gellens <randy(_at_)qualcomm(_dot_)com> wrote:

I've created a -03 based on comments received in the past few days.

Doh, I just noticed that a message that I sent to the list on Nov-09 had
the wrong From: address and therefore was not permitted to go to the
list (presumably it is still awaiting moderator approval).  I'll include
that message here.  Please remember that when it says "current draft",
it means draft -02.

---begin-old-message---

Date: Mon, 10 Nov 2003 07:46:05 +0000
To: IETF RFC-822 list <ietf-822(_at_)imc(_dot_)org>
Subject: Re: Format=Flowed/RFC 2646 Bis (-02)
Message-ID: <20031110074605(_dot_)GA29291(_at_)nicemice(_dot_)net>
References: 
<200311091913(_dot_)hA9JDVhk004902(_at_)crowley(_dot_)qualcomm(_dot_)com>
In-Reply-To: 
<200311091913(_dot_)hA9JDVhk004902(_at_)crowley(_dot_)qualcomm(_dot_)com>

randy(_at_)qualcomm(_dot_)com wrote:

it would have been even better had this activity occurred sometime
earlier in the past few years, since most of the text in question
hasn't changed in some time.

Indeed.  Somehow I had never heard of format=flowed until quite
recently.

I also don't see the problem with calling a group of lines intended to
be re-flowed a "paragraph".

"Paragraph" seems like a nice intuitive term to me.  In TeX, for
example, a "paragraph" is the one and only construct that flows (has
line breaks inserted automatically).  Keith hinted that "paragraph" too
loaded a term; maybe he'd like to explain that in more detail.

Adam also suggests mentioning in the section on interpreting f=f the 
exceptions for Usenet signatures and  changing-quote-depth, which 
seem like good ideas to me.

Two of the steps I listed need to be swapped.  Step 3 checks for the
sig line, and step 4 unstuffs.  I wrote them in that order because
section 5.3 says "an (optionally quoted) line consisting of DASH DASH
SP is not considered flowed."  But now I notice that the grammar
says sig-sep = [quote [stuffing]] "--" SP CRLF.  To be consistent,
section 5.3 should say "(optionally quoted, optionally stuffed)", and
the interpreting section should not check for a sig line until after
unstuffing.

(Or do you want to resolve the inconsistency the other way, by changing
the grammar?)

Adam also had a number of concerns over ambiguity in the grammar, with
suggestions for improvement.  I generally like the replacement text,
except for the removal of the distinction between quoted and unquoted
lines.  I thought it was helpful to identify a non-quoted line on
its own, and not just as a line with a quote-depth of zero.  This
is based on a perceived need to treat the two somewhat differently,
in particular, quoted lines need extra handling, state, and display
semantics.

I would think it would be easier for an implementor to write a single
handler & data structure for all paragraphs of all quote depths, rather
than make quote-depth-zero a special case.  Of course it's possible to
expand the grammar to make the distinction, but I think it makes the
grammar appear more complex than it really is.  (And I just noticed that
while the grammar in the current draft distinguishes between quoted
flowed and unquoted flowed, it does not distinguish between quoted fixed
and unquoted fixed.)

By the way, my proposed grammar forgot to handle the Usenet sig
exception.  Here's a fixed version that also incorporates my three
suggestions regarding improperly terminated paragraphs:

flowed-body       = * ( paragraph / fixed-line / sig-line )
paragraph         = 1*flowed-line fixed-line
                    ; That is the grammar for proper paragraphs, which
                    ; always end with a fixed line.  Improper paragraphs
                    ; are instead terminated by a change in quote-depth,
                    ; end of input, or a sig-line (which is not included
                    ; in the paragraph).
sig-line          = quote [stuff] "--" SP CRLF
fixed-line        = quote (stuff stuffed-fixed / unstuffed-fixed) CRLF
flowed-line       = quote (stuff stuffed-flowed / unstuffed-flowed) flow CRLF
stuffed-fixed     = [*text-char non-sp]
                    ; Does not end with SP.
unstuffed-fixed   = non-sp-quote [*text-char non-sp]
                    ; Does not begin with SP or ">", does not end with SP.
stuffed-flowed    = [non-dash *text-char] /
                    "-" [non-dash *text-char / "-" 1*text-char]
                    ; Is not "--".
unstuffed-flowed  = non-sp-quote-dash *text-char /
                    "-" [non-dash *text-char / "-" 1*text-char]
                    ; Not empty, not "--", does not begin with SP or ">".
quote             = *">"
stuff             = SP
flow              = SP
non-sp-quote-dash = <any character except NUL, CR, LF, SP, ">", "-">
non-sp-quote      = <any character except NUL, CR, LF, SP, ">">
non-sp            = <any character except NUL, CR, LF, SP>
text-char         = <any character except NUL, CR, LF>
non-dash          = <any character except NUL, CR, LF, "-">

One of my suggested grammar tweaks was that a sig line should not get
sucked into a paragraph, even if it is preceeded by a flowed line,
because then it could get re-wrapped and no longer appear at the start
of a line (and therefore cease to be a sig line).  I notice now that
this suggestion amounts to having a third type of line.  A sig line is
neither fixed nor flowed, because fixed and flowed lines can be inside
paragraphs, while sig lines can never be inside paragraphs.

Someone should double-check that grammar, especially the rules for
[un]stuffed-{fixed,flowed}.

AMC

---end-old-message---

Sorry you didn't get a chance to see that before revising the draft.
Henceforth, when I say "current draft", I mean draft -03.

* Added mention of quoting to Abstract and Introduction.
* Deleted line analysis table.
* Added note that c-t-e is irrelevant to flowed text processing
* Added text indicating that end of data terminates a paragraph
* Moved sig-sep out of fixed-line ABNF
* Mentioned exceptions in section on interpreting
* Moved section on interpreting before section on generating.
* Reworded non-normative "should"s.

All good.

* Changed some SHOULDs to MUSTs (space-stuffing, quoted paragraphs)

I haven't given the distinction much thought for this protocol.

* Added MUST NOT for OpenPGP and SHOULD for OpenPGP-MIME.

I'll defer to Simon and Cyrus, who seem to have that issue covered.

* Added note to ABNF that space and ">" are encoded according to charset

But what about the decoding side?  I think Ned has the right idea--the
clarification could say that the grammar is in terms of characters,
and therefore an encoder using the grammar to generate sequences
of characters would then need to transform the characters to bytes
according to the charset, and a decoder using the grammar to parse
sequences of characters would first need to transform the bytes to
characters according to the charset.

* Replaced ABNF rules to remove ambiguity

One ambiguity still remains.  Consider these two lines:

-- 
foo

The first ends with a space, and the second does not.  We would like
this to parse as a sig-sep and a fixed-line, but according to the
current grammar it can also parse as a paragraph, because "-- " matches
the flowed-line production.

It is possible to eliminate that ambiguity; see the grammar in the old
message above.

Another comment regarding the grammar:  It is nice for a grammar to give
names to the meaningful syntactic constructs.  For example, we'd like
a name for the quote-marks (and we have one), we'd like names for the
special spaces that act as flags (and we have them), and we'd like a
name for the actual content of the line without the quotes and flags,
but the grammar in the draft doesn't give us that.  Consider for example
stuffed-flowed.  In the draft, this means a line that *was* flowed and
*is* stuffed (it includes the stuff space but not the flow space).  In
the grammar in the old message above, stuffed-flowed means a line that
*was* flowed and *was* stuffed (it includes neither the stuff space nor
the flow space, only the actual content).

Section 5.1 says:

    If the line ends in a space, the line is flowed.  Otherwise it is
    fixed.  The exception to this rule is a signature separator line,
    described in Section 5.3.  Such lines end in a space but are not
    flowed.

That leaves the following question unanswered:  Are separator lines
fixed, or are they a third type of line?

According to the grammar in section 7, signature separator lines do not
match fixed-line.  That seems to suggest that they are a third type of
line, which is the view that seems most intuitive to me.  That could be
clarified by changing the last sentence of the quoted paragraph to "Such
lines end in a space but are neither flowed nor fixed."

Another way to view the situation is that sig-lines are fixed, and
paragraphs end with non-sig-sep fixed lines.  The grammar would then be:

flowed-body = *( paragraph / fixed-line )
fixed-line  = sig-sep / non-sig-sep-fixed-line
paragraph   = 1*flowed-line non-sep-fixed-line

But that looks unnaturally convoluted to me.  I prefer the existing
grammar:

flowed-body = *( paragraph / fixed-line / sig-sep )
paragraph   = 1*flowed-line fixed-line

Section 5.3 says:

    This is a special case; an (optionally quoted) line consisting of
    DASH DASH SP is not considered flowed.

Sections 5.1 (interpreting) and 7 (grammar) both indicate that a
signature line can be quoted and/or stuffed.  It is confusing for 5.3
to mention "optionally quoted" without also mentioning "optionally
stuffed".  Also, if in section 5.1 "not flowed" is changed to "neither
flowed nor fixed", the same change ought to be made here.

Section 5.3 goes on to say:

    Generating agents MUST NOT end a paragraph with such a signature
    line, since doing so would indicate that the separator line is part
    of the paragraph.

It would not indicate that the separator line is part of the paragraph,
it would indicate that the body is malformed (according to the grammar
and according to section 5.1); the receiver would not believe that the
separator is part of the paragraph (according to 5.1).  Perhaps the
intention is something like this:

    When placing soft line breaks in a paragraph, generating agents MUST
    NOT place them in a way that causes any line of the paragraph to
    be a signature separator line, because paragraphs cannot contain
    signature separator lines (see sections 5.1 and 7).

Section 5.4 says:

    Space-stuffing adds a single space to the start of any line which
    needs protection when the message is generated.  On reception, if
    the first character of a line is a space, it is logically deleted.
    This occurs after the test for a quoted line, and before the test
    for a flowed line.

It's not only after testing for a quoted line, but more importantly
after stripping the quoting.  And it's not only before the test for
a flowed line, but also before the test for a separator line.  Maybe
change the last sentence to:

    This occurs after deleting quote marks, and before testing for
    fixed, flowed, and separator lines.

Section 5.5 says:

    When generating quoted flowed lines, an agent needs to pay attention
    to changes in quote depth.  A sequence of quoted lines of the same
    quote depth immediately followed by lines of a different quote
    depth MUST be encoded so that lines of the same quote depth are a
    paragraph, with the last line generated as fixed and prior lines
    generated as flowed.

That seems to be a much stronger requirement than you intend.  Within a
single quote depth, there might be multiple paragraphs, non-paragraph
fixed-lines, and separator lines.  But the sentence quoted above seems
to say that because all of this text is a bunch of "lines at the same
quote depth", it must be encoded as "a paragraph", with the last line
fixed and all other lines flowed.  Perhaps the intention is something
like this:

    When generating quoted flowed lines, an agent needs to pay attention
    to changes in quote depth.  All lines of a paragraph MUST be
    unquoted, or else they MUST all be quoted and have the same quote
    depth.  Therefore, whenever there is a change in quote depth, or a
    change from quoted to unquoted, or change from unquoted to quoted,
    the line immediately preceeding the change MUST NOT be a flowed
    line.

The wording could be simplified if an unquoted line were simply a line
with a quote depth of zero:

    When generating quoted flowed lines, an agent needs to pay attention
    to changes in quote depth.  All lines of a paragraph MUST have the
    same quote depth.  Therefore, whenever there is a change in quote
    depth, the line immediately preceeding the change MUST NOT be a
    flowed line.

Section 5.5 goes on to say:

    If a receiving agent wishes to reformat flowed quoted lines (joining
    and/or wrapping them) on display or when generating new messages,
    the lines SHOULD be de-quoted, reformatted, and then re-quoted.  To
    de-quote, the number of close angle brackets in the quote indicator
    at the start of each line is counted.  Consecutive lines with the
    same quote depth are considered one paragraph and are reformatted
    together.  To re-quote after reformatting, a quote indicator
    containing the same number of close angle brackets originally
    present are prefixed to each line.

I think one sentence there is inaccurate: "Consecutive lines with the
same quote depth are considered one paragraph and are reformatted
together."  Consecutive lines with the same quote depth could be one
paragraph or several paragraphs or non-paragraph fixed lines (in which
case no reformatting is requested) or separator lines.  I think that
sentence can simply be removed.  Reformatting is covered elsewhere; this
section is about quoting.

The next two paragraphs are inconsistent with section 5.1:

    On reception, if a change in quote depth occurs on a flowed line,
    this is an improperly formatted message.  The receiver SHOULD handle
    this error by using the 'quote-depth-wins' rule, which is to ignore
    the flowed indicator and treat the line as fixed.  That is, the
    change in quote depth ends the paragraph.

    In other words, whenever two adjacent lines have different quote
    depths, senders MUST ensure that the earlier line is fixed (does
    not end in a space), and receivers SHOULD treat the earlier line as
    fixed regardless of whether it ends with a space.

According to section 5.1, the paragraph ends with the flowed line; it is
possible therefore to have an improperly terminated paragraph consisting
of a single flowed line, and such a paragraph would be reformatted.  If
the flowed indicator is ignored and the line is treated as fixed, then
we have a single fixed line, which is not a paragraph at all and would
not be reformatted.  Also, it is possible for the line before the change
in quote depth to be a separator line, which is arguably not fixed (see
the discussion above).  The inconsistency could be resolved like so:

    ...the 'quote-depth-wins rule', which is to consider the paragraph
    to end with the flowed line immediately preceeding the change in
    quote depth.

    In other words, whenever two adjacent lines have different quote
    depths, senders MUST ensure that the earlier line is not flowed
    (does not end in a space), and receivers finding a flowed line there
    SHOULD treat it as the last line of a paragraph.

Here we have more instances of the phrase "change in quote depth".  If
we keep the current view that unquoted lines have no quote depth and
quoted lines have non-zero quote depth, then we really ought to be
saying "change in quote depth, or change from quoted to unquoted, or
change from unquoted to quoted".  If we adopt the view that all lines
have a quote depth, which can be zero, then the simple phrase "change in
quote depth" will mean what we want it to mean.

AMC