ietf-822
[Top] [All Lists]

MIME, SGML

1994-03-12 14:50:23
A few questions and remarks about Internet Draft: MIME/SGML, 
draft-levinson-sgml-01.txt, of 17 Jan 94.

Charsets.

On page 5 the draft says that "the charset parameter 
specifes the body part character set."  Page 8, section 3.3, 
makes the assertion that "SGML documents use, by default, 
the ASCII character set" and remarks that for other 
charsets "the charset= parameter of the Content-Type: field 
specifies the actual character set."  SGML documents 
use whatever characters sets are specified in their SGML 
declaration, and may mix charsets.  ASCII cannot be assumed.

For a constrained example, for the Docbook DTD we are considering the 
following approach, which discussion has suggested is workable:  
Markup is in ASCIII.  For every element there is a charset 
attribute (allowing switching of charsets at element 
level), AND when switching to wide-byte charsets 
one must ALSO include escapes as discussed in ISO
TR 9573 and ISO 2022; these are defined in the SGML declaration
also.  On the model of the runic example in ISO TR 9573, this 
would come out something like (imagine that
the charset values are correct registered values):

<PARA charset=usascii>On the back of the stone an older form
of runes, with 24 letters in the alphabet, is used.  At the head
we read <FOREIGNPHRASE charset=roekrunes>sagwmogmeni Tad
hoaR igold iga iaRi goldin ad goanaR hosli</FOREIGNPHRASE>.</PARA>

That's for the case of an alternate charset that isn't wide.  If
it were, the example would be something like (imagine the escape
sequences are correct):

<PARA charset=usascii>On the back of the stone an older form
of runes, with 24 letters in the alphabet, is used.  At the head
we read <FOREIGNPHRASE charset=roekrunes>ESC 2/13 3/1 sagwmogmeni Tad
hoaR igold iga iaRi goldin ad goanaR hosli SI</FOREIGNPHRASE>.</PARA>

where "SI" indicates the end of wide-byte characters.  (For clarity,
I've left out the country and territory attributes we think we'll want 
also.)  Note that, while we don't plan to do it, an SGML document could use
the escapes anywhere inline, not necessarily at the boundaries
of elements.

Now does the language of the draft imply that at every point where
charsets switch there must be a separate MIME part?  If so, is there
no other choice?


Notations.

On page 6 the draft requires that any Notation be a valid MIME
content type.  Is there such a thing for Docbook's "linespecific"
notation, which says that line breaks and leading white space
must be preserved in output?


Security.

I believe the draft misrepresents the force of a Notation.  On
page 11 it is called a Trojan Horse, on the argument that one
could define a Notation that resolves to "delete *.*" .  But
the text string of a Notation is only advisory to the SGML
application:  it isn't intended as a system command.  If you
wanted to make a real Trojan Horse, you'd have to do something like use a
recognized Notation that meant something like "SunOS4.1systemcommand",
and then provide "delete *.*" as the content of an element 
bearing that Notation.  That might have a chance of working,
if your SGML-consuming software were as dumb as Trojans.

Regards,

-- 
Terry Allen  (terry(_at_)ora(_dot_)com)
Editor, Digital Media Group
O'Reilly & Associates, Inc.
Sebastopol, Calif., 95472

<Prev in Thread] Current Thread [Next in Thread>
  • MIME, SGML, Terry Allen <=