My comments on this draft will come in two parts: in this message, I
comment on the clarity of the description of the mechanisms as
proposed. In another message, I'll criticize the mechanisms
themselves, and suggest a modification, using the SGML Open entity
catalog.
As an implementor of the receiving end of a MIME/SGML user agent, the
number one question I want answered from the spec is:
From an application/sgml or multipart/sgml mime body part,
how do I construct an SGML document entity for parsing?
Then, how do I construct the other entities that comprise
the document, and how are they identified?
Conversely, for the sending side, I need to know exactly this:
Given an SGML document entity and the associcated collection
of entities that comprise an SGML document, how do I
construct a MIME body part for transmission?
In other words, how do I use MIME as an entity manager? I would very
much like to see the above two questions addressed explicitly in the
spec. They are answered in parts throughout the draft, but the result,
from my reading, is that they are answered incompletely.
The draft talks much about the logical structure of an SGML document:
1.1 SGML
...
A complete SGML document consists of an SGML declaration, a
prolog, and a document instance.
But I wish it gave more consideration to the _physical_ structure
of an SGML document, i.e. the entity structure of an SGML document,
which is all we need concern ourselves with for transmission.
From this perspective, an SGML document is:
* an SGML document entity (i.e. a sequence of characters)
* zero or more additional entities; either text entities
(sequences of characters) or data entities
(data in unspecified form).
The SGML declaration must be part of the SGML document entity, though
the SGML standard allows for the case where the SGML Delcaration may
be implicit in the SGML document entity; the MIME draft appears to
specify an out-of-band way to designate the SGML declaration, though
it seems to be missing from the grammar production in 2.2:
sgml-part := "instance" / "prolog"
/ "dtd" / "fosi" / extension-token
The prologue and instance may be completely contained in the document
entity, or they may reside partially in the additional text entities.
From my experience, the following is the typical entity/file structure
of a document:
widget.decl -- the SGML declaration for the "widget" SGML application
<!SGML
...
>
widget.dtd -- the SGML declaration for the "widget" SGML application
<!element widget ...>
<!element ...>
<!entity ...>
copy.sgml -- boilerplate text used in lots of documents
<p>This document shall not be reproduced in whole or in part...
ch1.sgml -- chapter one
<chap>Building Widgets ... </chap>
ch2.sgml
<chap>Using Widgets ... </chap>
report.sgml -- the document entity
<!doctype widget PUBLIC "-//Widget Consortium//DTD Widget//EN"
"widget.dtd"
[
<!entity copy-statement SYSTEM "copy.sgml">
<!entity ch1 SYSTEM "ch1.sgml">
<!entity ch2 SYSTEM "ch2.sgml">
]>
<widget>
©-statement;
&ch1;
&ch2;
</widget>
From my reading of the draft, this could be transmitted as:
Content-Type: multipart/sgml; instance="cid-report.sgml-1";
prolog="cid-report.sgml-2";
dtd="-//Widget Consortium//DTD Widget//EN"
declaration="cid-widget.decl";
boundary="next-part"
--next-part
Content-Description: pointer to widget.decl on ftp server
Content-Id: cid-widget.decl
Content-Type: message/external-body; type="anon-ftp";
site="ftp.widget.com";
directory="/pub/widget/sgml";
name="widget.decl"
Content-Type: application/sgml
--next-part
Content-Description: pointer to widget.dtd on ftp server
Content-Id: cid-widget.dtd
Content-Type: message/external-body; type="anon-ftp";
site="ftp.widget.com";
directory="/pub/widget/sgml";
name="widget.dtd"
Content-Type: application/sgml
--next-part
Content-Id: cid-report.sgml-1
Content-Type: application/sgml
<!doctype widget PUBLIC "-//Widget Consortium//DTD Widget//EN"
"cid-widget.dtd"
[
<!entity copy-statement SYSTEM "cid-copy.sgml">
<!entity ch1 SYSTEM "cid-ch1.sgml">
<!entity ch2 SYSTEM "cid-ch2.sgml">
]>
--next-part
Content-Id: cid-report.sgml-2
Content-Type: application/sgml
<widget>
©-statement;
&ch1;
&ch2;
</widget>
--next-part
Content-Id: cid-copy.sgml
Content-Type: application/sgml
<p>This document shall not be reproduced in whole or in part...
--next-part
Content-Id: cid-ch1.sgml
Content-Type: application/sgml
<chap>Building Widgets ... </chap>
--next-part
Content-Id: cid-ch2.sgml
Content-Type: application/sgml
<chap>Using Widgets ... </chap>
--next-part--
[In my next message, I'll argue that splitting report.sgml into two
body parts, and rewriting the system identifiers in the <!entity ...>
and <!doctype declarations are cumbersome and unnecessary.]
Is the usage of both the dtd and prolog parameters above correct and
"conventional"? It's certainly redundnant, but in a potentially useful
way: it allows a user agent to see what DTD the document uses just by
examining the headers of the multipart body part, without looking at
the individual parts.
I find the examples in the draft misleading, confusing, and incomplete.
The example in 3.1 is reasonable, except for a couple nits:
Thus a complete SGML document can appear as the following
MIME message.
But the example has no To:, From: or Mime-Version headers; it appears
to be a MIME body part, not a complete MIME message.
There's a '>' missing at the end of:
<!DOCTYPE radio
PUBLIC -//USA-DOD//DTD DRAFT TEMPLATE 911201//EN"
[<!ENTITY figure1
SYSTEM
"9312161426(_dot_)figure1(_at_)ryc(_dot_)pa(_dot_)nj(_dot_)us" NDATA gif
-- a reference to the file "figure1" --
>]
It might be better laid out as:
<!DOCTYPE radio
PUBLIC -//USA-DOD//DTD DRAFT TEMPLATE 911201//EN" [
<!ENTITY figure1
SYSTEM
"9312161426(_dot_)figure1(_at_)ryc(_dot_)pa(_dot_)nj(_dot_)us" NDATA gif
-- a reference to the file "figure1" -->
]>
I'd also suggest that rather than just:
< ... an SGML instance >
you make the example a little more explicit, ala:
<radio> ... SGML markup and data ... </radio>
But as an implementor, I can see how to construct the document
entity: take the body identified as the prolog and the body identified
as the instance, and concatenate them.
By the way... does the DTD used in that example (referenced as
"-//USA-DOD//DTD DRAFT TEMPLATE 911201//EN") work with the default
SGML declaration? As I recall, the DOD used slightly different SGML
declarations for its applications. It might be illustrative to indicate
how one would reference one of these out-of-band SGML declarations.
The example in 3.2, on the other hand, is very confusing:
Content-Type: application/SGML;
prolog="-//XYZ-CORP//SUBDOC RFC2010 100401//EN"
<! ... an SGML instance >
First, the formal public identifier given as the value of the prolog
parameter points to a public SUBDOC, which would not likely serve as a
prolog. From my experience, the FPI for a DTD looks something like:
"-//XYZ-CORP//DTD RFC2010 100401//EN"
but note that the text to which this FPI refers is typically a
document type delcaration subset, and not a complete document type
delcaration;i.e. it's missing the markup "<!DOCTYPE [" and "]>".
Moreover, an SGML instance almost _never_ begins with <! as in the example.
<! is the beginning of a markup declaration, which is much more common
in the prolog of a document than in the instance.
The net result is that, as an implementor, I am completely baffled as
to how to construct and SGML document entity for parsing from this
example.
I suggest that the draft be enhanced to include several more complete
examples, mixing and matching mechanisms proposed in the draft. Feel
free to use the widget example above.
Hmmm... the draft seems to make the common mistake of misusing the
term "SGML application". An SGML application is where you take SGML
and apply it to a problem. It's not (just) a piece of software. It's
more like in mathematics where you take the axioms, prove a few
theorems, and use that theory as the basis for disucssion. So group
theory, real analysis, and differential equasions are "mathematical
applications" the way DocBook, HTML, and CALS are SGML applications.
Where the draft uses the term "application" e.g. "a process is
interposed between the SGML application and the MIME user agent" I
would suggest you use the term "SGML system" from ISO-8879. An SGML
system, not an SGML application, is the abstraction that contains an
SGML parser.
Yikes... here's something that scares me:
2.2 Specifying the Document Parts
... If neither the SGML
declaration nor prolog is specified the recipient is free to
apply a local default.
This might make a useful "note to implementors," but should hope we
would discourage folks from transmitting anything but complete,
conforming SGML documents.
The prolog and instance should be required, with the declaration, dtd,
fosi, etc. as optional parameters. Otherwise, it is inclear how the
receiver should construct the document entity.
Daniel W. Connolly "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project (512) 834-9962 x5010
<connolly(_at_)hal(_dot_)com>
http://www.hal.com/%7Econnolly