ietf-822
[Top] [All Lists]

Re: Format=Flowed/RFC 2646 Bis (-02)

2003-11-06 08:38:09


I have to say that I find both RFC 2646 and this draft fairly opaque and
somewhat ambiguous.

In particular, what is a "line"?  is it:

- zero or more characters from the canonical form of a body part,
  beginning either at the start of the body part
  or immediately following a CRLF, and ending with a CRLF?

- zero or more characters from the encoded form of a body part,
  beginning either at the start of the body part
  or immediately following a CRLF, and ending with a CRLF
  whether or not it is preceded by a SP?

- zero or more characters from the canonical form of a body part,
  beginning either at the start of the body part
  or immediately following a CRLF, and ending with a CRLF
  that isn't preceded by a SP?

I'll leave this for Randy to answer.

In a charset that isn't compatible with ASCII, are the characters
">", SP, CR, LF treated specially using the values of those characters
from that charset, or are the octet values 0x3E, 0x20, 0x0D, 0x0A,
treated specially?

If a charset isn't compatible with ASCII insofar as CR and LF are concerned it
cannot be used with the text top-level type. (RFC 2046 section 4.1.1.) So this
is a vacuous concern.

As for ">" and SP, the usual way this is handled is for the specification
to work with those characters regardless of how they are represented.
(You won't find many charsets that muck with space, but there are some
that are modal and depending on mode may use ">" for some other purpose.)

I supposed you could define a charset without a ">" character, or even
one without space, but as a practical concern this seems a bit farfetched.

does the answer depend on the format in which
the message is stored?  (e.g. if the message is stored in a file
on a system whose native charset is ASCII compatible, line endings
in the storage format might still be a combination of CF and/or LF,
but they will have no significance for the canonical form of the
text at all, since that will be UTF-16, EBCDIC, whatever.)

This was decided eons ago. We define things in terms of the canonical form of
the data, and in the canonical form of MIME text line breaks are CRLF and CRLF
is always represented the same way. If a different representation is used on
some system (EBCDIC, counted records, whatever) it is for that system ot figure
out how to adapt the specification accordingly. Trying to account for all the
vagarities of local storage is a sure path to madness.

Here's a stab at defining this more succinctly and precisely.
(or perhaps, it's an indication of how much I misunderstood the
draft...)

If the format= parameter is set to "fixed" or the parameter is unspecified,
text/plain is to be interpreted per RFC 2046.

If the format= parameter is set to "flowed", text/plain is to be interpreted
per RFC 2046, with the following exceptions:

1. The sequence SP CR LF from the canonical form of the body part is to
be treated as follows:

a. if the delsp= parameter is set to "yes", the sequence SP CR LF is to
be ignored when displaying, printing, or otherwise presenting the body part.

b. if the delsp= parameter is set to "no", or the delsp= parameter is
unspecified, the sequence SP CR LF is to be treated as SP when displaying,
printing, or otherwise presenting the body part.

c. regardless of the value of the delsp= parameter, if the format=
parameter has a value of "flowed" the sequence SP CR LF is not treated
as a "line break".  (this changes the rule in section 4.1.1 of RFC 2046
which states that CR and LF are forbidden outside of line breaks)

2. The sequence CR LF from the canonical form of the body part, when
immediately preceded by SP, is interpreted as a line break.

3. A "line" consists of zero or more characters which start immediately at
the beginning of the canonical form of the body part, or immediately following
a line break.

4. "Lines" in body parts for which format=flowed MAY be "wrapped" as necessary
to fit the width of the display or output medium, by ceasing the output of
characters along one horizontal row of the output device or medium, and
continuing the output of subsequent characters along the next horizontal row
of the output device or medium.  Such wrapping SHOULD, when possible, be done
when a character sequence that is to be interpreted as SP is detected (either
a SP character, or if delsp=no or is unspecified, the sequence SP CR LF)

5. One or more ">" characters at the start of a line are taken as an indicator
that the text on that line are a quotation.  The greater number of ">"
characters, the greater the "depth" of the quotation.

6. User agents MAY display or present quotations using leading ">" characters
or in any other manner which is suitable for the output device or medium.  If
">" characters are used to indicate quotations for display or presentation,
the number of ">" characters displayed SHOULD equal the number of ">"
characters at the beginning of the line in the canonical form, if the display
reasonably permits this.  If some other means is used to indicate quotations
in the display or output medium, different levels of quotations SHOULD be
displayed or presented differently, so they can be distinguished by the
recipient.

7. Since the ">" notation applies to the entire "line" (as defined in #3 
above),
when a quotation line is "wrapped", the entire line SHOULD be presented as if
it were a single quotation (and all at the same level of depth), even if the
line is "wrapped" for display or presentation purposes.

8. The vertical spacing between output display rows SHOULD be the same between
rows of characters within a "wrapped" line as between separate lines.

9. In all of the rules in this section, the characters CR LF SP and ">"
have code values as defined by the charset parameter, even if those values
do not correspond to those in ASCII.

I guess I have no objection to this reformulation, but OTOH it really isn't
necessary to restate that we're dealing with the canonical form of the data.

                                Ned

<Prev in Thread] Current Thread [Next in Thread>