Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> writes:
It appears that if I want to design an application using SHAVE, I have
to know enough SGML to write a DTD.
That's right. One of the main advantages of using SGML for this sort
of application is that the DTD is a formal way of describing the legal
syntax, such that a document or body part can be automatically validated.
We don't want to loose sight of that capability.
It also appears that if I want to
parse a SHAVE formatted body part, I need to either have the DTD or
know which SHAVE parameters have which kinds of values. (else how do
I know when to look for a matching end-tag and when not to?)
What I'd like to have is a generalized parser for any SHAVE body part,
that does not need to know the DTD or equivalent information, in order
to extract the relevant parameters and values from the body part. Of
course such a parser wouldn't be able to do full syntax checking
(e.g. it wouldn't know which attribute names were valid for a
particular element), but for many applications this would not be a
I guess there's not too much difference between this kind of parser and
what I call a "naive" parser (which does understand something about the
application, but perhaps not very much).
One way to do this might be to require each parameter names to have a
suffix character which indicated what kind of values it takes. For
example, a parameter name that takes a single text value could end
with a '+', one that takes a sequence of parameter/value pairs could
end with a '/', one that takes a list of text values could end with a
'%', etc. (I don't know which of these would work within SGML, but
surely there are a few non-alphabetic characters which are legal for
I just tried + % / with sgmls - it doesn't like any of them. The dot
is the only non-alphameric I've seen used in element names. I guess
we could use .t, .pv and .tt suffixes instead.
SHAVE would be easier to use if it first defined the SHAVE syntax in
non-SGML terms (say with an 822-style grammar). A later section would
describe how SHAVE fits into the SGML world.
Yes, this would certainly be a good way to describe it - in fact, it would
be better than the current "restriction rules" method.
It would also be helpful
if there were a simple way to describe a SHAVE document, which could
be mechanically translated into an SGML DTD.
It should be possible to create a specification language which is:
* easy to describe
* easy to write in
* easy to convert to a DTD
What follows are nits:
+ a limit of 8 characters on element and attribute names would be very
painful. Does it break SGML compatibility to extend this a bit?
Pick a number and set the NAMELEN to that value in the SGML declaration
(which in the current draft is implicit in SHAVE rules 1 & 2).
+ it might be nice to provide an alternate way of including arbitrary
octet-strings as values, say by encoding them in base64. (maybe a
reserved tag/end-tag pair?)
Or an SGML attribute on the element.
+ if I understand rule 17, there's no way to encode a string like
"this is a string\r\n" because the trailing \r\n will be discarded
(even if there are multiple newlines).
True. This is a consequence of requiring all tags to occur at the start
of a line (except for white space). I guess we could change it to recognise
two newlines as meaning \r\n at the end of a value.