Re: MIME to Draft Standard

Keith writes:

I don't know SGML.  I know "C".  I don't want to have to learn much about
SGML to write a richtext interpreter.   I want a spec that defines what
*richtext* commands look like, how they nest, what their semantics are, and
how to extend the command set in a way that doesn't break things.  I want it
be *easy* to implement, because I want every MIME implementation to support
...


Keith,
   Given an appropriate set of tags and definitions (a DTD in the larger
sense), there are two ways to approach the job of building an SGML
processor.  One of them *starts* from an extant processor that one plugs
into, much as one might, as you suggest elsewhere, plug other existing
processors into other types of MIME body parts.  The thing into which
one plugs is a complicated creature, since it is expected to supported a
language structure that is pretty general (and, from my point of view as
someone who has used SGML in a couple of fairly important applications--
one quite atypical and one of the first uses for structured data
interchange whose output is not "text processing"--somewhat too general
and flexible for use).  But, as with using a JPEG engine, one can
implement a mail environment without knowing (or caring) much about what
is going on inside.  As with a lot of current technology parsing
applications, a general-purpose SGML front end takes two inputs: a
grammar (the DTD or, strictly speaking, part of it) and the text to be
processed.  Then it goes its merry way, probably supplemented by some
environment-specific macros whose functionality is described in the rest
of the DTD, but whose definitions aren't.
   The other alternative is one that is often more suitable for
specialized SGML applications where the [IMHO, excessive and confusing] 
possible flexibilities have been stripped down in the definitions and
the ratio of documents to be processed and definition-time-to-live
without changes are pretty large compared to the complexity of the
definition and, especially, where there are performance concerns.  And
that is to just go build a parser--in C, in lex, or whatever--to do the
job, possibly producing final form (whatever that means) and avoiding
the "macro" stage.  The lexical rules for an SGML-ized functional
equivalent to richtext could be frighteningly simple--all you would need
would be the ability to recognize a half-dozen lexically equivalent
generic identifiers (tags) and the corresponding end-tags and to handle
nesting, and then you need one action per tag type.  Same as richtext.

However, Erik responded to you by saying:

#To force a new specification of an SGML-conformant language to the size
#limits of richtext in RFC 1341 is an exercise in futility.  I refuse to
#waste my time to play games under such conditions.

This confuses me, and I know SGML and am an SGML fan.  The "size limits"
may be due to the fact that, as many of the implementers have pointed
out, richtext is a little underspecified in 1341.  Because DTDs are a
formal specification, it is difficult to underspecify them (an
advantage) but, as a result, they tend to not be brief (life is hard
sometimes and this becomes another argument for "separate document").

By contrast, Erik may be looking for a language with more extensive
capabilities, "richertext".  I think that defining such a thing would be
A Good Idea.  But not to replace richtext but to exist as (I would hope
a proper) superset of it.  And, with luck, to push the size of richtext
*down* on the theory that we have another place to send folks who need
mark-up capabilities that, while important, are rarely used in email.

I could see our trying to insist that people implement the subset
("richtext") and encouraging support for the superset ("richertext").  I
could see an implementation trying to handle the former in-line and
passing the latter off to a plug-in SMGL processor.

What would be useful, and encouraging to the folks who choose to do the
more extensive language, would be to be absolutely sure that richtext is
sufficiently conformant to the basic SGML grammar, and that the concepts
of richtext and richertext were similar enough, that we really could
have a subset/superset relationship--that an implementation was not
force to have two processors (or two DTDs) and could, if desired, treat
richtext as an instance of richertext that didn't happen to use some
tags or other constructions.

Erik, if you want to see this happen, we need a draft richertext
proposal--at least wrt what kinds of tags and constructions are in it
and what they do.  Less criticism of richtext for what it isn't and more
detail about what it should be.

And, while I'd like to see text processing experts look at the results
to provide sanity checks and architectural stupidity checks, I think the
"separate WG consisting of those folks" approach is the wrong one.  This
is an applications problem--we need more email *users* (small shortage
in the WG) to tell us what they need in most messages or experienced
and careful observers of email traffic and usage (no shortage today) to
tell us what they see as being necessary.  With those functional
requirements, creation of an appropriate DTD or other definition should
be a small technical matter.
  The WG hypothesis right now is that the definition of richtext in 1341
is isomorphic with the right set of functional requirements.  If you
think that is not true, please address what needs to be changed about
the (implict) functional requirements.  
  Turning the text processing folks loose on this is a very good way to
be told what we might, someday, want to do, not what the minimum set of
things that we need is.  We know 90% of the answer to the latter
question -- an application subtype for SGML/SDIF with no pretentions
about using it inline for mail in most environments -- and it is really
irrelevant to the problem richtext is trying to address.

     --john