ietf-822
[Top] [All Lists]

Text/Enhanced straw man

1993-02-12 14:14:28
All-

Thinking about simplemail a bit more, I've thrown together a straw man
for such a system that I think is quite workable.  I'll call this
"Text/Enhanced" to differntiate it from Bill's simplemail proposal,
although I would hope that we could all agree on a single system that
combines the best of all proposals.  

This description will be formatted as I would expect a text/enhanced
message to be, to give you an idea of what it would look like to
someone with a non-compliant UA.

I'd be interested in any feedback.

:: Introduction

The goals and requirements of an enhanced text format are

        -o It should be trivial to type.  In fact, as much as
           possible, it should mimic the current email and news
           conventions.  It should be unlikely that a user would
           accidentally imply markup.

        -o It should be possible to apply the simple formatting that
           people are used to in handwritten and typewritten letters
           and email, but not the fancy control provided by word
           processors and document formatters.

        -o It should be possible to read an enhanced message as
           straight text on a non-compliant UA and get a good feel for
           what the markups mean.

        -o It should be simple to display an enhanced message on a
           UA that understands the format, with varying levels of
           markup.  That is, proportional fonts and faces such as bold
           and italics may be used when available, but are not
           necessary.  The end user should be able to decide how the
           markup looks.

        -o It should be possible to quote fragments of enhanced and
           plain messages without worrying about escaping embedded
           characters or getting errors due to omitted delimiters.

:: Paragraphs

Text in an enhanced message is formatted into filled paragraphs by
default.  The initial left and right margins as well as first line
indent, whether the text is justified or hyphenated is left up to the
UA (or end user).  A paragraph ends with a blank line or a change of
quotation level (see below).  A blank line is a line that contains
only whitespace.  Multiple paragraph ends in a row are merged into a
single paragraph end.

Initial whitespace in a text line is always ignored.

:: Word Markup

Individual words in a paragraph may be *emphasized* or set in an
_alternate_font_.  The latter is expected to be used for introduction
of new terms as well as book titles.  These conventions are already
common on the net.  The phrase must begin and end with the markup
character (star or underscore), and spaces between words must also be
changed to the markup character.  When formatting, the words can be
shown as entered or in an alternate face (probably an itallic face for
_alternate_ text and an itallic or bold face for *emphasized* text).
If the markup characters are displayed and line breaking occurs within
a marked string, both the pre-break and post-break text should be
marked, as in

        - : ... _a_long_
          : _phrase_ ...

Enclosing a string in backquotes (`...`) represents a _literal_.
Markup characters are not significant within a literal, and a UA may
want to display it in a contrasting (probably fixed-width) font, as it
is expected to be used for code fragments.  A literal may contain
blanks or newlines (which are equivalent to blanks).  It may not cross
a paragraph boundary.  If the closing quote is not seen before the
paragraph boundary, the literal is silently terminated.

:: Unfilled Lines

Sometimes it is necessary to specify line breaks manually.  This is
useful when showing code fragments or lists of words.  Enhanced text
provides two ways to say that a line within a paragraph is not to be
filled with the lines which precede and follow it.  (The UA can decide
to insert line breaks *within* a long unfilled line, but there must be
a line break before it and a line break after it.)

If a line begins with ": " (colon, space), it is treated as an
unfilled paragraph line, subject to full markup.  White space
following the initial space is ignored.

If a line begins with "~ " (tilde, space), it is treated as a literal
line.  No markup is done on this line, and white space following the
initial space is significant.  This is the format expected to be used
for code fragments. *<I decided that tilde was a better prefix for
literal lines than Bill's comma because it makes a better "border"
when several lines occur in sequence as they generally do.>*  

When displayed by a compliant UA, the initial characters should be
stripped. 

:: Indentation and Lists

Enhanced text has a simple method for specifying indented paragraphs
and bulleted or enumerated lists of paragraphs.  If the first line of
a paragraph begins with "- " (dash, space) ("-") (following quote
prefixes and white space), it is read as indented one level.  (A UA is
free to choose how much to alter the left and right margins.)  All
lines in the paragraph, including unfilled lines, should use the new
margins.  When displaying, the initial dash and space should be
stripped.

Deeper levels of indentation are specified by more dashes.  Only the
first line of the paragraph needs to (or may) be so specified.  Since
initial white space is ignored, the writer may choose for readability
to also indent the paragraph for readability, as in the following
example:

        - This is a paragraph at indentation level one.  We indent it
          to show ourselves and users of non-compliant UAs
          : This is an unfilled line still at level one.
          Now we're back to filled stuff.

          -- This is a paragraph at indentation level two.
             ~ Here's a literal line at level two.

          -- ~ Here's another literal line at level 2 starting a paragraph.
             ~ Here's another literal line in the same paragraph.

             --- Etc.

I don't see any reason to arbitrarily set a limit to the amount of
nesting, but I would be surprised if people found a need to go much
beyond three levels.  

For lists, the dashes may be followed *immediately* (before the space)
by a string indicating the item tag.  There is no auto-numbering done,
it is up to the writer to provide the appropriate tags.  The special
tag "o" (lower-case o) should be replaced by the UAs best approximation
of a bullet (if it wants to get fancy, it could have different bullets
for different levels).  Item paragraphs should be displayed indented
to the appropriate level with the tag set as a hanging indent.

Examples of list items:

        -o A bullted list item.

        -1. An item numbered "1.".

            --a) An subitem numbered "a)".

            -- A paragraph at the same level as the previous item, but
               without a tag.  Use this for multi-paragraph items.

:: Notes

It is often useful to be able to make a digression without
interrupting the point being made.  In formatted text, this function
is served by footnotes, but that option is not generally available in
email.  A convention has developed of inserting a footnote mark*
(generally an asterisk or bracketed number) in the paragraph and
setting the footnote as a separate paragraph immediately following the
one that contains the mark.

* Like this.

Enhanced text encourages this form of display while allowing a simpler
form of expression.  Footnotes are represented in the message
delimited by "*<" (star, less-than) and ">*" (greater-than, star).
*<Like this.>* There should be a space before the initial delimiter
and following the closing delimiter.

This is readable as is on a non-compliant UA.  A compliant UA should
choose a mark (probably from "*", "**", etc. or "[1]", "[2]", etc.
starting anew each paragraph), and insert it in the paragraph
(removing preceding white space).  When the paragraph is finished, the
notes should be set as separate paragraphs at the same indentation
level (or slightly indented) preceded by their mark.  Of course, a
sufficiently powerful UA may display notes as true footnotes or
hypertext buttons.

Notes may not contain paragraph breaks, *<They are silently ended when
a paragraph break is encountered>* and they may not contain notes.  If
they are quoted fragmentally so that one of the delimiters is missing,
the formatting may be incorrect, but the effect should be localized.

:: Sections

Many articles (such as this one) are naturally divided into
_sections_.  In Enhanced Text, a section title is a paragraph that
begins with ":: " (colon, colon, space).  The rest of the paragraph is
the secion title, including any section number.  Compliant UAs can
choose an appropriate style for sections and can even extract a "Table
of Contents".  Users of non-compliant UAs benefit as well, as the
standardized prefix makes it possible to search for a section in most
UAs or editors.

:: Quotations

One of the most common forms of markup in email today is quoting other
articles.  The quoted article may in turn quote other articles, and
the nesting can get quite deep.  Currently, the standard convention is
to insert a _prefix_ (generally ">") on each quoted line.  Deeper
quotes will have more than one prefix (as, ">>").

In Enhanced Text, a line beginning with ">" (greater-than), "]"
(right-bracket), or "|" (vertical-bar) (optionally followed by white
space) is a quotation, and the quote level is given by the number of
such prefixes found.  Unquoted text is at quote level zero.

When displaying, extra indentation should be added for each quote
level over and above the indent level of the paragraph being
displayed.  The quote prefix (or some normalized form of it) should be
displayed in the left margin of each line as an aid in following the
conversation.  The UA may also want to display quotations from
different articles in contrasting styles. *<In Emacs, I display
odd-level quotations in red and even-level quotations in green;  It
makes it much easier to follow long conversations.>*

The quote prefix is stripped before looking for blank lines, section
titles, indent symbols, or unfilled-line symbols.  A change in quote
level from one line to the next signifies the end of a paragraph.

When composing a reply or follow-up, if the message was enhanced, the
editor should insert "> " before each line.  If the message was not
enhanced, the editor should insert "> ~ " to signal to UAs that they
should not try to further interpret the line, fill paragraphs, etc.

Several different quote prefixes are allowed in order to be able to
quote more than one article in the same follow-up.

:: Signatures

Signature blocks in general consist of literal lines.  The quote-level
zero line consisting entirely of "--" (dash, dash) followed by
optional white space that signals the beginning of the signature
should signal that lines to the end of the part should be treated as
literal lines.  When quoting an article, lines in the signature should
be quoted with "> ~ " to prevent wrapping.

:: Pointing

One common piece of markup in usenet articles is _pointing_ into a
quotation to signal the word or phrase that the writer wishes to
address.  The pointer may be followed immediately by the comment, as
in

        - ~ > Here is a controversial word
        - ~             ^^^^^^^^^^^^^  You said it!

The comment may be the following paragraph, as in

        - ~ > Here is a controversial word
        - ~             ^^^^^^^^^^^^^
        - ~ You said it!

or it may follow later

        - ~ > Here is a controversial word followed by some
        - ~             ^^^^^^^^^^^^^
        - ~ > more text to end the sentence
        - ~
        - ~ I couldn't agree more with your choice of words!

This is already problematic, as many people use UAs which display the
text in a proportional font, and what appears to be pointed to is not,
often, what the writer meant due to differences in widths between
characters.  This will only get worse as UAs are allowed to fill
paragraphs.

I have to confess that I don't know a good solution to this problem.
I do think that it is an important ability, akin to circling words in
handwritten replies.  Whatever scheme we come up with must allow for
both the pointer and the comment to be quoted in further follow-ups,
with it being apparent which article the pointer/comment came from.

--
Evan Kirshenbaum                       +------------------------------------
    HP Laboratories                    | Never ascribe to malice that which
    3500 Deer Creek Road, Building 26U | can adequately be explained by
    Palo Alto, CA  94304               | stupidity.
                                       | 
    kirshenbaum(_at_)hpl(_dot_)hp(_dot_)com            | 
    (415)857-7572

<Prev in Thread] Current Thread [Next in Thread>