While working on BBN/Slate, a multimedia mail system of CMU's Andrew's
ilk, I tried to implement text/richtext input/output converters. I found
that 1) it had many Andrew-based predjudices in its text model that did
not jibe with the model in either Slate or your typical Mac/Windows simple
text processor and 2) there was an amazing amount of implied semantics
that came out of Andrew's handling of formatting that was not in any way
part of the spec.
[...]
text/enriched is much more tightly specified and much more constrained in
the kind of markup it is trying to provide.
Amen. I had exactly the same reflection when writing the text/richtext and
text/enriched converters for NeXT's (RTF-based) mail UA. The updated
text/enriched spec is much clearer and one thing I'm particularly happy to
see explicitly stated is the implied line breaks around paragraph formatting
commands like <center>.
<nofill> still seems a bit loose, though. In particular, it's not clear to
me exactly what the extent of the affected text really is supposed to be.
From the draft-03 spec, it sounds like it will begin immediately after the
right bracket in "<nofill>" and continue to immediately before the left
bracket in "</nofill>". However, this means that an example like this:
--------------------------------
<nofill>
aaa
bbb
</nofill>
<nofill>
xxx
yyy
</nofill>
--------------------------------
will generate the somewhat surprising:
--------------------------------
aaa
bbb
xxx
yyy
--------------------------------
since each nofill block will include the CRLF just after <nofill>, the one
just before </nofill>, and the double CRLF between the two blocks will
generate an extra CRLF too.
HTML's <pre> command has a special rule for this. It dictates that any
directly adjoining newline to the <pre> command is to be excluded from the
affected text. With that rule in effect, you'd get this instead:
--------------------------------
aaa
bbb
xxx
yyy
--------------------------------
Which is more in tune with the other commands. For example, if you have:
--------------------------------
<center>
aaa
bbb
</center>
<center>
xxx
yyy
</center>
--------------------------------
you currently get:
--------------------------------
aaa bbb
xxx yyy
--------------------------------
if I read the spec right.
On a related topic, the rule that makes a single CRLFs turn into a space
seems a bit simplistic. For example, this:
--------------------------------
first
<flushleft>
<bold>
second
</bold>
</flushleft>
third
--------------------------------
will generate this:
--------------------------------
first
second
third
--------------------------------
That is, there will be an extra space before both "second" and "third" and
maybe one after "first" and "second" too. It would probably be better to
borrow an idea from the paragraph commands and say something like "a single
CRLF will cause a space to be produced unless it would mean that it would be
generated immediately next to another space or newline".
My apologies if I misread something in the spec.
--Lennart