Re: Comments on MIME/SGML

Dan,

Re: Separation of prolog and instance

My local SGML expert who is familiar with a number of commercial SGML
implementations suggested the separation of prolog and instance.
Perhaps some other readers can provide insight, I am not tied to a
particular approach as long as we achieve interworkability,
being able to exchange SGML documents across h/w and s/w platforms.

Hmmm... my reasons were chiefly practial too; they were based on
experience with the SGMLs package. Could you give some background (or
pointers to materials I should read) about these "various
implementations" that treat the prologue and the instance differently?

... application/sgml is fine. I just
don't like it when MH uses base64 encoding on my html body parts when
I know most of my audience can read html source -- perhaps I just need
to learn to use my tools better.


I suspect a configuration issue with MH creates the behaviour you
dislike.  Perhaps it assumes that application/* means binary content.

... Let's take another whack at the SGML->MIME correspondence:
...
I can't find the term "notation type" in the SGML standard. I have
found:
      4.75 data content notation: An application-specific ...


Once again my language is sloppy, perhaps I should be embarrassed,
but instead am encouraged and challenged to clean it up.  I will start
working on it this week.

... there's some redundancy: in SGML, the
choice of notations is expressed in the ENTITY declaration along with
the "filename" info. In MIME, the content type is expressed in the
referenced body part. When using MIME/SGML, we have to put it in both
places.


Yes, there is redundancy.  In effect we take specific SGML elements,
external entities, and promote (and copy) them to the MIME
encapsulation level.  Doing so makes the MIME body parts self
contained; one can default Multipart/SGML to Multipart/Mixed and view
the pictures, assuming you have the proper viewers.  Knowledge of SGML
is not needed to reduce the message body to a sequence of local files;
when the external entity filenames are replaced SGML knowledge is
needed but it is compartmentalized.  Thus, the layering of SGML inside
MIME creates the need for the redundancy.  Redundancy itself isn't
bad, just unnecessary redundancy.

Is the connection between SGML notiation identifiers and the MIME
Content-Type syntax supposed to be explicit, or is there an implicit
correspondence ...


Your example, duplicated below, of explicit connections is what I
intended.

      --8<
      Content-Type: application/postscript
      Content-ID: id1

      %!PS-Adobe...

      --8<
      Content-Type: application/sgml

      <!DOCTYPE T SYSTEM [
      <!NOTATION ps SYSTEM "application/postscript">
      <!ENTITY fig1 SYSTEM "id1" NDATA ps>
      ]>
      ...
      --8<--

Hmmm... the implicit connection is probably more practical, but it
introduces redundancy and the chance for errors. The explicit mapping
causes the namespace of SYSTEM identifiers to include MIME
content-types. Blech.


Say more about that.  I think users are more likely to be able to
relate valid MIME Content-types to the display software, and more
likely to have the appropriate software.  The idea, again, is to
promote 
interworkability.  Are people more likely to have viewers for the MIME
content-type?  Can I expect you to have a viewer?  Using the MIME
content-types increases the possibility that I will recognize what you
sent.  In summary, I don't see the disadvantages you do.

[More comments on my informal language]


One of the problems I implicitly address is how to communicate the
issues to non-SGML savvy readers.  The informality makes it easier.
Perhaps.  I will look at this issue as I re-look at the lack of
precision you point out.

The idea here is that MIME plays the role of entity manager, and MIME
body parts map 1-1 to SGML entities.


The entity manager could use the MIME message body directly and an
implementation might choose to do that.  The standard should not
require that nor need it prevent that.  The question to be asked is,
"Is using the MIME message body directly reasonable?"  Someone
needs to answer that question and it's probably not me.

The first production in the

standard is:

      [1] SGML document = SGML document entity
              (SGML subdocument entity |
              SGML text entity | non-SGML data entity)*

You can't split the prologue and the instance across SGML entities.
But you _can_ split the SGML document entity across system-specific
objects:

      NOTES
      1 This Internation Standard does not constrain the physical
      organization of the document within the data stream, message
      handling protocol, filesystem, etc., that contains it. In
      particular, separate entities could occur in the same physical
      object, a single entity could be divided between multiple
      objects, and the objects could occur in any order

Using the example I originally sent, we had:
                                                              SGML

term or

      Content-ID:                     Contents                App

convention

      <10024(_dot_)761615492(_dot_)3(_at_)ulua>       SGML document           
App
      <10024(_dot_)761615492(_dot_)4(_at_)ulua>       external entity         
SGML
      <10024(_dot_)761615492(_dot_)5(_at_)ulua>       SGML document entity    
SGML
      <10024(_dot_)761615492(_dot_)6(_at_)ulua>       SGML text entity        
SGML
      <10024(_dot_)761615492(_dot_)7(_at_)ulua>       SGML declaration        
App

Your suggestion makes it look like:
      Content-ID:                     Contents
      <10024(_dot_)761615492(_dot_)3(_at_)ulua>       SGML document           
App
      <10024(_dot_)761615492(_dot_)4(_at_)ulua>       external entity         
SGML
      <10024(_dot_)761615492(_dot_)5(_at_)ulua>       prolog                  
App
      <10024(_dot_)761615492(_dot_)6(_at_)ulua>       external entity         
SGML
      <10024(_dot_)761615492(_dot_)7(_at_)ulua>       declaration             
App
      <10024(_dot_)761615492(_dot_)8(_at_)ulua>       instance                
App

But in the end, it's not really critical that SGML text entities map
exactly to MIME body parts (even my proposal did app-specific stuff
with the SGML declaration). [Hmmm... until you start talking about
subdocument entities... I think a concrete example of this is in

order.]

The critical thing is how all this interacts with available (and
conceivable) tools. For example, with either of the above examples, I
could do
      
      mhn store cur

and get several files: 4.sgml, 5.sgml, 6.sgml, ...
After I replace system identifiers (SYSTEM 
"10024(_dot_)761615492(_dot_)6(_at_)ulua")
with filenames (SYSTEM "6.sgml") in those files, I could validate the
document using:

      sgmls -s 7.sgml 5.sgml          # Connolly's version, or
      sgmls -s 7.sgml 5.sgml 8.sgml   # Levinson's version

Hmmm... about replacing system identifiers... this could be a _really_
tedious process. ...


The unpacker will take care of the tedium.  It will have the
information it needs from the Content-ID and Content-type headers.  Of
course it must parse the subdoc and the entity structure to do that.

* As it stands, the MIME/SGML packer/unpacker cannot be implemented as
an SGML layer over MIME or as a MIME layer over SGML -- it must be a
piece of software that understands both simultaneously (see the above
entity usage). I suggest that instead of messing with the SYSTEM
identifiers in the data stream, we do an external mapping. Using the
above example, the packer would write:


This approach has been proposed to the SGMLOpen Technical Committee.
I intend to comment on that proposal so I will defer for now except to
make some general comments and give an outline of my thoughts.  Not
all entity managers are alike nor entity manager configuration files.  
To use the manager approach requires standards for configuration
files, or updates to them and the ability to dynamically add entries
to the configuration file.  In contrast, using the content-ID allows
a layer of impedance matching software to straighten out the
differences between implementation.  The cost, as you point out, is in
the complexity of the packer and unpacker.  The issue though is
interworkability and the content-ID approach provides, I believe, 
greater
interworkability.

* It's not clear how the single application/sgml body part works.
[example deleted]
This implies an algorithm for producing an SGML document entity from
a public identifier for a DTD and an instance. I don't quite see how
to do this in general (what's the name of the DOCTYPE?).


Given the previous discussion dtd=... would become prolog= and that
might be quite simple, perhaps referencing the public dtd in the
example.

Dan, I know I did not respond to everything you wrote but I will
review you comments further.  Again I travel this week so responses
will be slow.  I acknowledge the effort you've put into understanding
the proposal and working out the implications.  I appreciate it.

Best.../Ed