Comments on MIME/SGML1994-02-18 17:16:56Proposed Changes for MIME representation of SGMLDaniel W. Connolly <connolly(_at_)hal(_dot_)com>SGML Entites in MIMEI believe that the goals of Mr. Levinson's MIME Content-types for SGML Documents are essential to the success of the intenet as an Integrated Open Hypermedia (c.f. HyTime) system.However, after a careful reading of the SGML standard (specifically section 6: Entity Structure), I believe that SGML/MIME fails to specify the most important machine representation of an SGML document; that is, the SGML document entity (production 2, section 6.2). In conventional practice, an SGML document "doc" of type "T" is represented as a file doc.sgml, which looks like: <!DOCTYPE T SYSTEM "t.dtd" [ <!ENTITY fig1 SYSTEM "foo.ps" postscript> ]> <T>blah blah blah <figure graphic=fig1> blah blah blah</T>along with an SGML declaration in the file T.decl. Technically speaking, the SGML document entity is the concatenation of T.decl and doc.sgml (with perhaps some system-specific newline->RS/RE conversions). But according to the standard it's OK to interchange SGML documents with an implied SGML delcaration, and in practice, the SGML declaration is often compiled into the processing software. So for all intents and purposes, the file doc.html is an SGML document entity. And it seems critical that the file doc.html should correspond to the body of some MIME body part. The draft misuses the term "DTD", aka "Document Type Definition" (defn 4.104). An SGML document indeed has three parts: the SGML declaration, the prologue, and the instance. And the distinction between the term prologue and the term DTD is not trivial. First, to be pedantic, a DTD is not generally representable in SGML syntax. The concept of the DTD includes not only the SGML-representable formal part, but also the associated application conventions which cannot be represented in SGML. Also -- a document may have more than one DTD in its prologue. Second, to be practical, the conventional machine representation of a DTD is just an SGML text entity in the file t.dtd which looks like:
<!NOTATION postscript PUBLIC "-/Adobe/Postscript"> <!ELEMENT T - - (#PCDATA) > ...Note that it does not correspond exactly to the prologue of the document, in that it does not contain the <!DOCTYPE [ ...
]> markup. For these reasons, I suggest the following modifications to the proposed MIME representation of SGML documents: 1. We make the following correspondence between the terms of the SGML standard and the MIME RFC:
3. We change the parameters of the multipart/SGML content type from sgml-part := "intance" / "declaration" / "dtd" / "fosi" / extension-tokento sgml-part := "document" / "declaration" / "dtd" / "fosi" / extension-tokenwhere "document" is required, declaration is optional, and dtd is acutally redundant (since it's in the document entity) but useful, since a MIME UA might want to know what kind of document it is without parsing the document. References
Editorial NoteI used HTML because it works to a certain extent, not because I think it's exactly how I think internet IOH should work. My comments on HTML are still under development. See my notebook on the design on an HTML successor.Production NoteThis document is brought to you by the following tools:
<!DOCTYPE HTML SYSTEM "10024(_dot_)761615492(_dot_)6(_at_)ulua" -- PUBLIC "-//IETF/DRAFT/ietf-iiir-html-01" @@ -- [ <!-- $Id$ --> <!ENTITY web-node SYSTEM "10024(_dot_)761615492(_dot_)4(_at_)ulua"> ]> <HTML> &web-node; </HTML> <!-- Jul 1 93 --> <!-- Regarding clause 6.1, SGML Document: [1] SGML document = SGML document entity, (SGML subdocument entity | SGML text entity | non-SGML data entity)* The role of SGML document entity is filled by this DTD, followed by the conventional HTML data stream. --> <!-- DTD definitions --> <!ENTITY % heading "H1|H2|H3|H4|H5|H6" > <!ENTITY % list " UL | OL | DIR | MENU "> <!ENTITY % literal " XMP | LISTING "> <!ENTITY % headelement " TITLE | NEXTID |ISINDEX" > <!ENTITY % bodyelement "P | HR | %heading | %list | DL | ADDRESS | PRE | BLOCKQUOTE | %literal"> <!ENTITY % oldstyle "%headelement | %bodyelement | #PCDATA"> <!ENTITY % URL "CDATA" -- The term URL means a CDATA attribute whose value is a Uniform Resource Locator, as defined. (A URN may also be usable here when defined.) --> <!ENTITY % linkattributes "NAME NMTOKEN #IMPLIED HREF %URL; #IMPLIED REL CDATA #IMPLIED -- forward relationship type -- REV CDATA #IMPLIED -- reversed relationship type to referent data: PARENT CHILD, SIBLING, NEXT, TOP, DEFINITION, UPDATE, ORIGINAL etc. -- URN CDATA #IMPLIED -- universal resource number -- TITLE CDATA #IMPLIED -- advisory only -- METHODS NAMES #IMPLIED -- supported public methods of the object: TEXTSEARCH, GET, HEAD, ... -- "> <!-- Document Element --> <!ELEMENT HTML O O (( HEAD | BODY | %oldstyle )*, PLAINTEXT?)> <!ELEMENT HEAD - - ( TITLE? & ISINDEX? & NEXTID? & LINK* & BASE?)> <!ELEMENT TITLE - - RCDATA -- The TITLE element is not considered part of the flow of text. It should be displayed, for example as the page header or window title. --> <!ELEMENT ISINDEX - O EMPTY -- WWW clients should offer the option to perform a search on documents containing ISINDEX. --> <!ELEMENT NEXTID - O EMPTY> <!ATTLIST NEXTID N NAME #REQUIRED -- The number should be a name suitable for use for the ID of a new element. When used, the value has its numeric part incremented. EG Z67 becomes Z68 --> <!ELEMENT LINK - O EMPTY> <!ATTLIST LINK %linkattributes> <!ELEMENT BASE - O EMPTY -- Reference context for URLS --> <!ATTLIST BASE HREF %URL; #IMPLIED > <!ENTITY % inline "EM | TT | STRONG | B | I | U | CODE | SAMP | KBD | KEY | VAR | DFN | CITE " > <!ELEMENT (%inline;) - - (#PCDATA)> <!ENTITY % text "#PCDATA | IMG | %inline;"> <!ENTITY % htext "A | %text" -- Plus links, no structure --> <!ENTITY % stext -- as htext but also nested structure -- "P | HR | %list | DL | ADDRESS | PRE | BLOCKQUOTE | %literal | %htext"> <!ELEMENT BODY - - (%bodyelement|%htext;)*> <!ELEMENT A - - (%text)> <!ATTLIST A %linkattributes; > <!ELEMENT IMG - O EMPTY -- Embedded image --> <!ATTLIST IMG SRC %URL; #IMPLIED -- URL of document to embed -- > <!ELEMENT P - O EMPTY -- separates paragraphs --> <!ELEMENT HR - O EMPTY -- horizontal rule --> <!ELEMENT ( %heading ) - - (%htext;)+> <!ELEMENT DL - - (DT | DD | %stext;)*> <!-- Content should match ((DT,(%htext;)+)+,(DD,(%stext;)+)) But mixed content is messy. -Dan Connolly --> <!ELEMENT DT - O EMPTY> <!ELEMENT DD - O EMPTY> <!ELEMENT (UL|OL) - - (%htext;|LI|P)+> <!ELEMENT (DIR|MENU) - - (%htext;|LI)+> <!-- Content should match ((LI,(%htext;)+)+) But mixed content is messy. --> <!ATTLIST (%list) COMPACT NAME #IMPLIED -- COMPACT, etc.-- > <!ELEMENT LI - O EMPTY> <!ELEMENT BLOCKQUOTE - - (%htext;|P)+ -- for quoting some other source --> <!ELEMENT ADDRESS - - (%htext;|P)+> <!ELEMENT PRE - - (#PCDATA|%inline|A|P)+> <!ATTLIST PRE WIDTH NUMBER #implied > <!-- Mnemonic character entities. --> <!ENTITY AElig "Æ" -- capital AE diphthong (ligature) --> <!ENTITY Aacute "Á" -- capital A, acute accent --> <!ENTITY Acirc "Â" -- capital A, circumflex accent --> <!ENTITY Agrave "À" -- capital A, grave accent --> <!ENTITY Aring "Å" -- capital A, ring --> <!ENTITY Atilde "Ã" -- capital A, tilde --> <!ENTITY Auml "Ä" -- capital A, dieresis or umlaut mark --> <!ENTITY Ccedil "Ç" -- capital C, cedilla --> <!ENTITY ETH "Ð" -- capital Eth, Icelandic --> <!ENTITY Eacute "É" -- capital E, acute accent --> <!ENTITY Ecirc "Ê" -- capital E, circumflex accent --> <!ENTITY Egrave "È" -- capital E, grave accent --> <!ENTITY Euml "Ë" -- capital E, dieresis or umlaut mark --> <!ENTITY Iacute "Í" -- capital I, acute accent --> <!ENTITY Icirc "Î" -- capital I, circumflex accent --> <!ENTITY Igrave "Ì" -- capital I, grave accent --> <!ENTITY Iuml "Ï" -- capital I, dieresis or umlaut mark --> <!ENTITY Ntilde "Ñ" -- capital N, tilde --> <!ENTITY Oacute "Ó" -- capital O, acute accent --> <!ENTITY Ocirc "Ô" -- capital O, circumflex accent --> <!ENTITY Ograve "Ò" -- capital O, grave accent --> <!ENTITY Oslash "Ø" -- capital O, slash --> <!ENTITY Otilde "Õ" -- capital O, tilde --> <!ENTITY Ouml "Ö" -- capital O, dieresis or umlaut mark --> <!ENTITY THORN "Þ" -- capital THORN, Icelandic --> <!ENTITY Uacute "Ú" -- capital U, acute accent --> <!ENTITY Ucirc "Û" -- capital U, circumflex accent --> <!ENTITY Ugrave "Ù" -- capital U, grave accent --> <!ENTITY Uuml "Ü" -- capital U, dieresis or umlaut mark --> <!ENTITY Yacute "Ý" -- capital Y, acute accent --> <!ENTITY aacute "á" -- small a, acute accent --> <!ENTITY acirc "â" -- small a, circumflex accent --> <!ENTITY aelig "æ" -- small ae diphthong (ligature) --> <!ENTITY agrave "à" -- small a, grave accent --> <!ENTITY amp "&" -- ampersand --> <!ENTITY aring "å" -- small a, ring --> <!ENTITY atilde "ã" -- small a, tilde --> <!ENTITY auml "ä" -- small a, dieresis or umlaut mark --> <!ENTITY ccedil "ç" -- small c, cedilla --> <!ENTITY eacute "é" -- small e, acute accent --> <!ENTITY ecirc "ê" -- small e, circumflex accent --> <!ENTITY egrave "è" -- small e, grave accent --> <!ENTITY eth "ð" -- small eth, Icelandic --> <!ENTITY euml "ë" -- small e, dieresis or umlaut mark --> <!ENTITY gt ">" -- greater than --> <!ENTITY iacute "í" -- small i, acute accent --> <!ENTITY icirc "î" -- small i, circumflex accent --> <!ENTITY igrave "ì" -- small i, grave accent --> <!ENTITY iuml "ï" -- small i, dieresis or umlaut mark --> <!ENTITY lt "<" -- less than --> <!ENTITY nbsp " " -- should be NON_BREAKING space --> <!ENTITY ntilde "ñ" -- small n, tilde --> <!ENTITY oacute "ó" -- small o, acute accent --> <!ENTITY ocirc "ô" -- small o, circumflex accent --> <!ENTITY ograve "ò" -- small o, grave accent --> <!ENTITY oslash "ø" -- small o, slash --> <!ENTITY otilde "õ" -- small o, tilde --> <!ENTITY ouml "ö" -- small o, dieresis or umlaut mark --> <!ENTITY szlig "ß" -- small sharp s, German (sz ligature) --> <!ENTITY thorn "þ" -- small thorn, Icelandic --> <!ENTITY uacute "ú" -- small u, acute accent --> <!ENTITY ucirc "û" -- small u, circumflex accent --> <!ENTITY ugrave "ù" -- small u, grave accent --> <!ENTITY uuml "ü" -- small u, dieresis or umlaut mark --> <!ENTITY yacute "ý" -- small y, acute accent --> <!ENTITY yuml "ÿ" -- small y, dieresis or umlaut mark --> <!-- deprecated elements --> <!ELEMENT (%literal) - - CDATA> <!ELEMENT PLAINTEXT - O EMPTY> <!-- Local Variables: --> <!-- mode: sgml --> <!-- compile-command: "sgmls -s -p " --> <!-- end: --> <!SGML "ISO 8879:1986" -- Document Type Definition for the HyperText Markup Language as used by the World Wide Web application (HTML DTD). NOTE: This is a definition of HTML with respect to SGML, and assumes an understanding of SGML terms. If you find bugs in this DTD or find it does not compile under some circumstances please mail www-bug(_at_)info(_dot_)cern(_dot_)ch -- CHARSET BASESET "ISO 646:1983//CHARSET International Reference Version (IRV)//ESC 2/5 4/0" DESCSET 0 9 UNUSED 9 2 9 11 2 UNUSED 13 1 13 14 18 UNUSED 32 95 32 127 1 UNUSED BASESET "ISO Registration Number 100//CHARSET ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1" DESCSET 128 32 UNUSED 160 95 32 255 1 UNUSED CAPACITY SGMLREF TOTALCAP 150000 GRPCAP 150000 SCOPE DOCUMENT SYNTAX SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 255 BASESET "ISO 646:1983//CHARSET International Reference Version (IRV)//ESC 2/5 4/0" DESCSET 0 128 0 FUNCTION RE 13 RS 10 SPACE 32 TAB SEPCHAR 9 NAMING LCNMSTRT "" UCNMSTRT "" LCNMCHAR ".-" UCNMCHAR ".-" NAMECASE GENERAL YES ENTITY NO DELIM GENERAL SGMLREF SHORTREF SGMLREF NAMES SGMLREF QUANTITY SGMLREF NAMELEN 34 TAGLVL 100 LITLEN 1024 GRPGTCNT 150 GRPCNT 64 FEATURES MINIMIZE DATATAG NO OMITTAG NO RANK NO SHORTTAG NO LINK SIMPLE NO IMPLICIT NO EXPLICIT NO OTHER CONCUR NO SUBDOC NO FORMAL YES APPINFO NONE
|
|