| At present richtext lets you define your own commands, which will be
| treated as no-ops on a viewer that doesn't support them...but they
| have to have matching end-tags, and this is sometimes inappropriate.
| So I'd like a richtext parser to be able to determine from syntax,
| that no end-tag is expected for a particular command.
I'd like to pause to consider the different kinds of markup languages that
have been developed, and how richtext and SGML fit into this picture, and
where your suggestion fits, too.
First, there's the procedural markup languages, where the markup is only an
instruction to an interpreter to do something. Procedural markup languages
often come with macro support to make a little easier for people. A
typical procedural markup language is troff. (Binary encoded streams are
also procedural, except that by "markup language" we usually mean one that
can be read and typed by humans.)
Second, there's the generic markup languages, where the markup is drawn
from a defined vocabulary that identifies the contents of elements of the
text as being of a particular type. Generic markup languages are typically
bracketed languages (i.e., have "begin" and "end" markup). The meaning of
a particular word in the markup vocabulary is defined by the language. A
typical generic markup language is LaTeX.
Third, there's the generalized markup languages, where the markup is a
generalization of the generic markup idea. The language defines a general
syntax, and a means to define a generic markup language, sometimes called
an "application". The semantics of the resulting markup is not defined in
the language, but by the application.
Procedural markup languages require a set of instructions to be defined,
and a macro facility to build upon them. Generic markup languages require
a vocabulary to be defined, and a means to associate words with processing,
so that the words can have several processing semantics associated with
them. Generalized markup languages (and SGML in particular) only require a
general syntax to be defined, so that a generic markup language can be
expressed in this language (as well as the concrete rules of this markup
language for documents using the generic markup vocabulary).
richtext is a mixture of a procedural and a generic markup language, and
your request for something to help you know what kind of content a
particular element has is squarely in the domain of generalized markup
languages. SGML, for instance, has a means to say that an element does not
have any content, and thus needs only a start-tag. An SGML parser will
thus know that an element has no contents by the declaration of the allowed
content of the element. (However, an end-tag will immediately follow the
start-tag as seen by the application interfacing with the parser.)
I'd like to see richtext cleaned up conceptually, and it should aim at
being only one of the three types of markup languages described above.
Note that a generalized markup language is capable of representing all
Erik Naggum ISO 8879 SGML +47 295 0313
Oslo, Norway ISO 10744 HyTime Watch this ^ space
<erik(_at_)naggum(_dot_)no> ISO 9899 C Memento,
<SGML(_at_)ifi(_dot_)uio(_dot_)no> ISO 10646 UCS Memento,