At 5:47 AM +0000 11/17/03, Adam M. Costello wrote:
Two of the steps I listed need to be swapped. Step 3 checks for the
sig line, and step 4 unstuffs. I wrote them in that order because
section 5.3 says "an (optionally quoted) line consisting of DASH DASH
SP is not considered flowed." But now I notice that the grammar
says sig-sep = [quote [stuffing]] "--" SP CRLF. To be consistent,
section 5.3 should say "(optionally quoted, optionally stuffed)", and
the interpreting section should not check for a sig line until after
unstuffing.
(Or do you want to resolve the inconsistency the other way, by changing
the grammar?)
The -03 version fixed the grammar to treat signature separator lines
as a third type of line. I just created an -04 which changes "an
(optionally quoted) line consisting of DASH DASH SP" to "an
(optionally quoted or quoted and stuffed) line consisting of DASH
DASH SP".
Note that the idea was to allow signature separator lines to be
quoted, but not stuffed unless also quoted. That way, stuffing can
be used to guard against a line being confused with a signature
separator.
> Adam also had a number of concerns over ambiguity in the grammar, with
suggestions for improvement. I generally like the replacement text,
except for the removal of the distinction between quoted and unquoted
lines. I thought it was helpful to identify a non-quoted line on
its own, and not just as a line with a quote-depth of zero. This
is based on a perceived need to treat the two somewhat differently,
in particular, quoted lines need extra handling, state, and display
semantics.
I would think it would be easier for an implementor to write a single
handler & data structure for all paragraphs of all quote depths, rather
than make quote-depth-zero a special case.
Making the distinction in the grammar still allows a client to
implement either way.
Of course it's possible to
expand the grammar to make the distinction, but I think it makes the
grammar appear more complex than it really is.
This is a good point. The grammar in the -03 and -04 versions is
based on the suggested replacement you provided, plus fixes to treat
signature lines as a third type, and to distinguish between quoted
and unquoted. I agree that this latter change does make the grammar
larger and more complex.
(And I just noticed that
while the grammar in the current draft distinguishes between quoted
flowed and unquoted flowed, it does not distinguish between quoted fixed
and unquoted fixed.)
This was fixed in -03.
By the way, my proposed grammar forgot to handle the Usenet sig
exception.
I fixed this in -03.
Here's a fixed version that also incorporates my three
suggestions regarding improperly terminated paragraphs:
flowed-body = * ( paragraph / fixed-line / sig-line )
paragraph = 1*flowed-line fixed-line
; That is the grammar for proper paragraphs, which
; always end with a fixed line. Improper paragraphs
; are instead terminated by a change in quote-depth,
; end of input, or a sig-line (which is not included
; in the paragraph).
sig-line = quote [stuff] "--" SP CRLF
fixed-line = quote (stuff stuffed-fixed / unstuffed-fixed) CRLF
flowed-line = quote (stuff stuffed-flowed / unstuffed-flowed) flow CRLF
stuffed-fixed = [*text-char non-sp]
; Does not end with SP.
unstuffed-fixed = non-sp-quote [*text-char non-sp]
; Does not begin with SP or ">", does not end with SP.
stuffed-flowed = [non-dash *text-char] /
"-" [non-dash *text-char / "-" 1*text-char]
; Is not "--".
unstuffed-flowed = non-sp-quote-dash *text-char /
"-" [non-dash *text-char / "-" 1*text-char]
; Not empty, not "--", does not begin with SP or ">".
quote = *">"
stuff = SP
flow = SP
non-sp-quote-dash = <any character except NUL, CR, LF, SP, ">", "-">
non-sp-quote = <any character except NUL, CR, LF, SP, ">">
non-sp = <any character except NUL, CR, LF, SP>
text-char = <any character except NUL, CR, LF>
non-dash = <any character except NUL, CR, LF, "-">
That definition for sig-line allows it to be stuffed but not quoted,
which we have been prohibiting.
The suggested definitions for flowed lines attempt to eliminate the
ambiguity between a flowed line and a signature separator but I don't
think they allow for all cases of flowed lines. I think we'd need
something like:
non-sp-quote-dash *text-char /
"-" non-dash *text-char /
"--" non-space *text-char /
"--" 2*text-char
But this wouldn't work since the definition of flowed-line includes
the flow (space) at the end. I don't see any way to resolve the
ambiguity without really convoluting the ABNF. I think it may be OK
to just note the potential ambiguity, especially since a check for
signature line can be made before checking for a flowed line. I've
added a note in the ABNF section of -04 about it.
One of my suggested grammar tweaks was that a sig line should not get
sucked into a paragraph, even if it is preceeded by a flowed line,
because then it could get re-wrapped and no longer appear at the start
of a line (and therefore cease to be a sig line). I notice now that
this suggestion amounts to having a third type of line. A sig line is
neither fixed nor flowed, because fixed and flowed lines can be inside
paragraphs, while sig lines can never be inside paragraphs.
I agree, and I believe this was fixed in -03.
Another comment regarding the grammar: It is nice for a grammar to give
names to the meaningful syntactic constructs. For example, we'd like
a name for the quote-marks (and we have one), we'd like names for the
special spaces that act as flags (and we have them), and we'd like a
name for the actual content of the line without the quotes and flags,
but the grammar in the draft doesn't give us that. Consider for example
stuffed-flowed. In the draft, this means a line that *was* flowed and
*is* stuffed (it includes the stuff space but not the flow space). In
the grammar in the old message above, stuffed-flowed means a line that
*was* flowed and *was* stuffed (it includes neither the stuff space nor
the flow space, only the actual content).
This is a nice feature, and I believe I've achieved it in -04 (at the
expense of some extra parentheses).
Section 5.1 says:
If the line ends in a space, the line is flowed. Otherwise it is
fixed. The exception to this rule is a signature separator line,
described in Section 5.3. Such lines end in a space but are not
flowed.
That leaves the following question unanswered: Are separator lines
fixed, or are they a third type of line?
According to the grammar in section 7, signature separator lines do not
match fixed-line. That seems to suggest that they are a third type of
line, which is the view that seems most intuitive to me. That could be
clarified by changing the last sentence of the quoted paragraph to "Such
lines end in a space but are neither flowed nor fixed."
I agree and made this change in -04.
Section 5.3 says:
This is a special case; an (optionally quoted) line consisting of
DASH DASH SP is not considered flowed.
Sections 5.1 (interpreting) and 7 (grammar) both indicate that a
signature line can be quoted and/or stuffed. It is confusing for 5.3
to mention "optionally quoted" without also mentioning "optionally
stuffed". Also, if in section 5.1 "not flowed" is changed to "neither
flowed nor fixed", the same change ought to be made here.
Thanks; I believe this has been cleaned up in -04 so that it is now
clear from all references in the text as well as the grammar that
signature lines can be quoted or quoted and stuffed but they can't be
stuffed without being quoted.
Section 5.3 goes on to say:
Generating agents MUST NOT end a paragraph with such a signature
line, since doing so would indicate that the separator line is part
of the paragraph.
It would not indicate that the separator line is part of the paragraph,
it would indicate that the body is malformed (according to the grammar
and according to section 5.1); the receiver would not believe that the
separator is part of the paragraph (according to 5.1).
That jumped out at me as I was making another change. I deleted the
second clause, so now it just says "Generating agents MUST NOT end a
paragraph with such a signature line".
Perhaps the intention is something like this:
When placing soft line breaks in a paragraph, generating agents MUST
NOT place them in a way that causes any line of the paragraph to
be a signature separator line, because paragraphs cannot contain
signature separator lines (see sections 5.1 and 7).
I'm not sure if that was the original intent or not, but I liked the
text you suggest and so added it to the section on generating f=f
(with references to the section on signature lines and on the abnf).
Section 5.4 says:
Space-stuffing adds a single space to the start of any line which
needs protection when the message is generated. On reception, if
the first character of a line is a space, it is logically deleted.
This occurs after the test for a quoted line, and before the test
for a flowed line.
It's not only after testing for a quoted line, but more importantly
after stripping the quoting.
But the test for quoted line is what deletes the quote marks. I
changed the text in -04 to say "This occurs
after the test for a quoted line (which logically counts and deletes
any quote marks)".
And it's not only before the test for
a flowed line, but also before the test for a separator line.
I think the test for a signature line has to happen both before the
test for a quoted line and also after deleting quote marks and
stuffing.
Section 5.5 says:
When generating quoted flowed lines, an agent needs to pay attention
to changes in quote depth. A sequence of quoted lines of the same
quote depth immediately followed by lines of a different quote
depth MUST be encoded so that lines of the same quote depth are a
paragraph, with the last line generated as fixed and prior lines
generated as flowed.
That seems to be a much stronger requirement than you intend. Within a
single quote depth, there might be multiple paragraphs, non-paragraph
fixed-lines, and separator lines. But the sentence quoted above seems
to say that because all of this text is a bunch of "lines at the same
quote depth", it must be encoded as "a paragraph", with the last line
fixed and all other lines flowed. Perhaps the intention is something
like this:
When generating quoted flowed lines, an agent needs to pay attention
to changes in quote depth. All lines of a paragraph MUST be
unquoted, or else they MUST all be quoted and have the same quote
depth. Therefore, whenever there is a change in quote depth, or a
change from quoted to unquoted, or change from unquoted to quoted,
the line immediately preceeding the change MUST NOT be a flowed
line.
Indeed. Thanks for catching this.
Section 5.5 goes on to say:
If a receiving agent wishes to reformat flowed quoted lines (joining
and/or wrapping them) on display or when generating new messages,
the lines SHOULD be de-quoted, reformatted, and then re-quoted. To
de-quote, the number of close angle brackets in the quote indicator
at the start of each line is counted. Consecutive lines with the
same quote depth are considered one paragraph and are reformatted
together. To re-quote after reformatting, a quote indicator
containing the same number of close angle brackets originally
present are prefixed to each line.
I think one sentence there is inaccurate: "Consecutive lines with the
same quote depth are considered one paragraph and are reformatted
together." Consecutive lines with the same quote depth could be one
paragraph or several paragraphs or non-paragraph fixed lines (in which
case no reformatting is requested) or separator lines. I think that
sentence can simply be removed. Reformatting is covered elsewhere; this
section is about quoting.
Another good catch. The sentence is deleted in -04.
The next two paragraphs are inconsistent with section 5.1:
On reception, if a change in quote depth occurs on a flowed line,
this is an improperly formatted message. The receiver SHOULD handle
this error by using the 'quote-depth-wins' rule, which is to ignore
the flowed indicator and treat the line as fixed. That is, the
change in quote depth ends the paragraph.
In other words, whenever two adjacent lines have different quote
depths, senders MUST ensure that the earlier line is fixed (does
not end in a space), and receivers SHOULD treat the earlier line as
fixed regardless of whether it ends with a space.
According to section 5.1, the paragraph ends with the flowed line; it is
possible therefore to have an improperly terminated paragraph consisting
of a single flowed line, and such a paragraph would be reformatted. If
the flowed indicator is ignored and the line is treated as fixed, then
we have a single fixed line, which is not a paragraph at all and would
not be reformatted. Also, it is possible for the line before the change
in quote depth to be a separator line, which is arguably not fixed (see
the discussion above). The inconsistency could be resolved like so:
...the 'quote-depth-wins rule', which is to consider the paragraph
to end with the flowed line immediately preceeding the change in
quote depth.
In other words, whenever two adjacent lines have different quote
depths, senders MUST ensure that the earlier line is not flowed
(does not end in a space), and receivers finding a flowed line there
SHOULD treat it as the last line of a paragraph.
I agree; thanks again.
Here we have more instances of the phrase "change in quote depth". If
we keep the current view that unquoted lines have no quote depth and
quoted lines have non-zero quote depth, then we really ought to be
saying "change in quote depth, or change from quoted to unquoted, or
change from unquoted to quoted". If we adopt the view that all lines
have a quote depth, which can be zero, then the simple phrase "change in
quote depth" will mean what we want it to mean.
I think "change in quote depth" can include a changing between quoted
and unquoted. Even though I have retained the ABNF distinction
between quoted and unquoted, I don't think the text has to be too
rigid about it.
--
Randall Gellens
Opinions are personal; facts are suspect; I speak for myself only
-------------- Randomly-selected tag: ---------------
Nothing astonishes men so much as common sense and plain dealing.
--Ralph Waldo Emerson.