ietf-822
[Top] [All Lists]

Re: Format=Flowed/RFC 2646 Bis (-02)

2003-11-08 12:07:51

Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> wrote:

I avoided using the term "paragraph" because that has semantics other
than for presentation.

Okay, but I think we need to keep the term "line" referring to its
intuitive and customary sense, so that we can say things like "each
line may begin with one or more quote characters" and "a space at the
beginning or end of a line (after quoting has been removed) acts as a
flag".

I think it would be confusing, therefore, to use a term like "flowed
line" to refer to something that spans multiple lines.  I think the
current meaning of "flowed line" (a line that ends in a space) is more
intuitive.

If "paragraph" needs to be avoided, perhaps a new term could be
invented, like "snake" or "flow-group".

I don't like using the term "soft" for the SP CR LF sequence in a
flowed line because it's too easily confused with Q-P soft line
breaks, even though they have nothing to do with one another.

I was using "soft" to refer just to the SP, not the CR LF.  I was
calling the single spaces at the beginning and end of the line
"stuffing" and "soft", respectively.  Perhaps better names would be
"stuff" and "flow".

I'm concerned that people will implement format=flowed differently
for objects depending on the content-transfer-encoding, when the
content-transfer-encoding should be irrelevant.

That ought to follow from the MIME architecture (format=flowed is a
parameter of the content-type, therefore the content-transfer-encoding
is indeed irrelevant), but it wouldn't hurt to mention it explicitly.

It's more important to clean up the text than the grammar.

Try this:  Forget everything you've read in the draft and just read this
excerpt (45 lines):
    
    If the first character of a line is a quote mark (">"), the line is
    considered to be quoted (see section 5.5).  Logically, all quote
    marks are counted and deleted, resulting in a line with a non-zero
    quote depth, and content. (The agent is of course free to display
    the content with quote marks or excerpt bars or anything else.)
    Logically, this test for quoted lines is done before any other tests
    (that is, before checking for space-stuffed and flowed).
    
    If the first character of a line is a space, the line has been
    space-stuffed (see section 5.4).  Logically, this leading space is
    deleted before examining the line further (that is, before checking
    for flowed).

    If the line ends in one or more spaces, the line is flowed.
    Otherwise it is fixed.
    
    If the line is flowed and DelSp is "yes", the trailing space
    immediately prior to the line's CRLF is logically deleted.  If the
    DelSp parameter is "no" (or not specified, or set to an unrecognized
    value), the trailing space is not deleted.

    Any remaining trailing spaces are part of the line's content, but
    the CRLF of a soft line break is not.
    
    A series of one or more flowed lines followed by one fixed line is
    considered a paragraph, and MAY be flowed (wrapped and unwrapped) as
    appropriate on display and in the construction of new messages (see
    section 5.5).
    
    A line consisting of one or more spaces (after deleting a stuffed
    space) is considered a flowed line.
    
    An empty line (just a CRLF) is a fixed line.

    There is a convention in Usenet news of using "-- " as the separator
    line between the body and the signature of a message.  When
    generating a Format=Flowed message containing a Usenet-style
    separator before the signature, the separator line is sent as-is.
    This is a special case; an (optionally quoted) line consisting of
    DASH DASH SP is not considered flowed.

    whenever two adjacent lines have different quote depths, senders
    should ensure that the earlier line is fixed (does not end in
    a space), and receivers should treat the earlier line as fixed
    regardless of whether it ends with a space.

That's just sections 5.2 and 5.3 and one paragraph from 5.5.  I find
that if I focus only on this 45-line excerpt and ignore the rest, I
completely understand format=flowed and delsp=yes.  Do you agree (except
for the content-transfer-encoding concern)?

Perhaps it might help to move this material earlier.  Imagine swapping
sections 5.1 (Generating format=flowed) and 5.2 (Interpreting
format=flowed).  I find it much easier to understand a format first from
a decoder's point of view ("this is what can appear, and this is what
it means").  Then, after I understand the format, I can more easily
understand rules/recommendations aimed at encoders.

I think it might also help to include all the essentials of the decoding
algorithm in the interpreting section.  Right now, two details are
omitted and sprung on the reader in later sections: the Usenet sig
exception and the changing-quote-depth exception.  The interpreting
section could include them concisely and provide forward references,
just like it already does for quoting and space-stuffing.

It's easier to follow an explanation if you have a preview of where
it's heading.  Consider an interpreting section that began with such a
preview and then gave all the steps:

    An interpreter of format=flowed text processes the text line by
    line.  ("Line" refers to lines of the text/plain data, because
    format=flowed is a parameter of text/plain.  "Line" does not refer
    to lines of any quoted-printable, base64, or other encoding of the
    text/plain data.)  Each line can have characters removed from the
    beginning and/or end, and each line is tagged with a quote-depth (a
    non-negative integer) and a flow-type ("fixed" or "flowed").  The
    tags are used to group lines into paragraphs/snakes/flow-groups that
    can be re-wrapped for display or construction of other messages.

    For each line (in order), the following steps are applied (in
    order):

    1. All quote marks (">") are removed from the beginning of the line
    and counted; the count becomes the quote-depth of the line. [other
    remarks] [forward reference to quoting section]

    2. If the line is not the first line, and if its quote-depth differs
    from the quote-depth of the previous line, then the previous line
    is expected to have a flow-type of "fixed".  In properly generated
    text, that will be true; if the previous line's flow-type is
    "flowed" then the text was generated improperly.  In that case,
    reset the flow-type of the previous line to "fixed", and re-do
    step 7 for that line. [forward reference to quoting section]

        [[ Suggestion:  Perhaps, instead of overriding the
        flow-type, the line should be left as flowed, but the
        paragraph/snake/flow-group is nevertheless terminated, resulting
        in an improper paragraph/snake/flow-group that does not end
        with a fixed line.  This would yield different treatment for a
        single flowed line followed by a change in quote-depth.  The
        existing rules change it to a single fixed line, which would not
        be re-wrapped.  But clearly it was intended to be re-wrappable.
        The suggested new rule would allow it to be re-wrapped.  The
        existing draft contains the sentence "the change in quote depth
        ends the paragraph", which is inconsistent with the rest of the
        existing changing-quote-depth rule, because when a single flowed
        line is changed to fixed, there is no paragraph.  But that
        sentence would be consistent with the suggested new rule. ]]

    3. If the line is "-- " (dash dash space) then set the flow-type
    to "fixed" and go on to the next line (do not proceed to step 4).
    [forward reference to Usenet sig section]

        [[ Suggestion:  Maybe a sig line should also terminate a
        paragraph/snake/flow-group, same as a change in quote-depth.
        Otherwise a sig line could get re-wrapped so that it no longer
        appears on a line by itself. ]]

    4. If the line begins with a space, the space is removed. [other
    remarks] [forward reference to space-stuffing section]

    5. If the line ends with a space then set the flow-type to "flowed",
    otherwise set it to "fixed".

    6. If the line is flowed (that is, it's flow-type is "flowed") and
    delsp is "yes" then remove the space at the end of the line.

    7. If the line is flowed then it is part of a growing
    paragraph/snake/flow-group.  If the line is fixed and is preceeded
    by a flowed line, then the fixed line is the last line of the
    paragraph/snake/flow-group, which MAY be wrapped and unwrapped as
    appropriate for display and construction of new messages.  If the
    line is fixed and is not preceeded by a flowed line, then it is not
    part of a paragraph/snake/flow-group. [forward reference] [Unicode
    reference].

        [[ Suggestion:  The current draft says nothing about what
        a decoder should do if the very last line is flowed.  I
        suggest that end-of-input be yet another thing (along with
        quote-depth changes and sig lines) that can improperly terminate
        a paragraph/snake/flow-group. ]]

    Note that multiple consecutive spaces have no special significance.
    Only the single spaces at the beginning and end of a line have a
    special meaning; any others are simply part of the line's content.
    Steps 4 and 6 delete at most one space each.

Would an early section along those lines be at all helpful?

By the way, I wonder what purpose section 5.7 serves.  It looks like
just a very verbose way of saying "A line is quoted iff it begins with a
quote indicator.  A line is flowed iff it ends with a space."  I don't
see the point of the first statement, because every quote depth needs
to be displayed differently--I see nothing special about the boundary
between zero and non-zero.  The second statement is imprecise because it
neglects the Usenet sig exception.  The author might want to consider
simply removing section 5.7.

AMC

<Prev in Thread] Current Thread [Next in Thread>