Re: Comments on MIME/SGML

In message <940314040726_70312(_dot_)1656_CHV52-1(_at_)CompuServe(_dot_)COM>, 
Ed Levinson writes
:

My local SGML expert who is familiar with a number of commercial SGML
implementations suggested the separation of prolog and instance.
Perhaps some other readers can provide insight, I am not tied to a
particular approach as long as we achieve interworkability,
being able to exchange SGML documents across h/w and s/w platforms.


I agree -- interoperability is the critical factor. But in the absence
of emperical evidence to motivate ad-hoc ways to slice and dice the
SGML document entity, we can at least be as precise as the SGML
standard about the representation of a document by saying that an
entity is, in general, represented as one MIME body part.

Any conceivable SGML toolset must implement the notion of an entity.
It is not obliged, on the other hand, to make the prologue and
instance conveniently available.

Hmmm... the implicit connection is probably more practical, but it
introduces redundancy and the chance for errors. The explicit mapping
causes the namespace of SYSTEM identifiers to include MIME
content-types. Blech.


Say more about that.


Well, earlier we made a correspondence between SYSTEM identifiers
and Content-IDs, and now we're polluting that same namespace with
MIME content types. It just means that one must consider the context
of a SYSTEM identifier to see whether it can be interpreted as a
Content-ID or a Content-Type.

 I think users are more likely to be able to
relate valid MIME Content-types to the display software, and more
likely to have the appropriate software.  The idea, again, is to
promote 
interworkability.  Are people more likely to have viewers for the MIME
content-type?  Can I expect you to have a viewer?  Using the MIME
content-types increases the possibility that I will recognize what you
sent.  In summary, I don't see the disadvantages you do.


The entity will be tagged with a MIME Content-Type in the Body part
whether or nor it is so tagged in the SGML data stream. So all the
MIME user agent viewer stuff will work even with an implicit
connection between data content notations and MIME Content-Types.

Writing
<!NOTATION ps SYSTEM "application/postscript">
is only better that writing
<!NOTATION ps PUBLIC "-//Adobe/Postscript" -- @@exact syntax?-->
if there are tools out there that display SGML documents
using MIME conventions to map data content notations to viewers.

The advantage of the implicit correspondence is that the packer is
simpler -- it doesn't touch <!NOTATION> declarations.

[More comments on my informal language]


One of the problems I implicitly address is how to communicate the
issues to non-SGML savvy readers.  The informality makes it easier.
Perhaps.  I will look at this issue as I re-look at the lack of
precision you point out.


<Diversion>

There are two kinds of writing: writing that effectively preserves
information, and writing that effectively communicates information.
Exhaustive enumeration is a good technique for preserving, while
successive elaboration is much more effective for communicating. It's
tough to be complete and precise without being boring and impenetrable
(try reading ISO 8879 if you're not convinced...).

On the other hand, the task at hand is a specification; that is,
convincing your audience that you have completely and unambiguously
explained the techniques.

I find formal arguments and language much more convincing than
informal language. But I'm extremely left-brained compared to most
folks.

I think perhaps the best compromise is: lots of detailed examples.
The have the power to make issues clear and they capture a lot of
information.

</Diversion>

Hmmm... about replacing system identifiers... this could be a _really_
tedious process. ...


The unpacker will take care of the tedium.  It will have the
information it needs from the Content-ID and Content-type headers.  Of
course it must parse the subdoc and the entity structure to do that.


This sounds like famous last words to me. Do you have even a
proof-of-concept implementation of the unpacker you have described? I
would estimate a complete implementation to be a 4 to 6 person-month
effort.

It's important that the MIME/SGML strategy be rapidly deployed, and I
think the is a MAJOR obstacle.

* As it stands, the MIME/SGML packer/unpacker cannot be implemented as
an SGML layer over MIME or as a MIME layer over SGML -- it must be a
piece of software that understands both simultaneously (see the above
entity usage). I suggest that instead of messing with the SYSTEM
identifiers in the data stream, we do an external mapping. Using the
above example, the packer would write:


This approach has been proposed to the SGMLOpen Technical Committee.
I intend to comment on that proposal so I will defer for now except to
make some general comments and give an outline of my thoughts.


This is background info that I don't have... is there a way I can
get at it? Is there an internet resource I can consult?

 Not
all entity managers are alike nor entity manager configuration files.  
To use the manager approach requires standards for configuration
files, or updates to them and the ability to dynamically add entries
to the configuration file.


I don't follow your line of reasoning. I made no mention of entity
manager configuration files. In fact, in most cases, the
sgml-entity-map wouldn't be used at all! It's only necessary in cases
where the sending and receiving systems are vastly incompatible. And
in this case, the unpacking algorithm is only trivially more complex
than the one that deals with Content-IDs inside SYSTEM identifiers.

Hmm... in fact, if the Content-Disposition stuff ever solidifies, it
may obviate the need for this whole proposal.

 In contrast, using the content-ID allows
a layer of impedance matching software to straighten out the
differences between implementation.  The cost, as you point out, is in
the complexity of the packer and unpacker.  The issue though is
interworkability and the content-ID approach provides, I believe, 
greater
interworkability.


Could you give an example where the content-id approach is
significantly simpler than the external mapping approach? Or where it
provides greater interworkability? I suggest that in the vast majority
of cases, the opposite is the case.

* It's not clear how the single application/sgml body part works.
[example deleted]
This implies an algorithm for producing an SGML document entity from
a public identifier for a DTD and an instance. I don't quite see how
to do this in general (what's the name of the DOCTYPE?).


Given the previous discussion dtd=... would become prolog= and that
might be quite simple, perhaps referencing the public dtd in the
example.


This implies that there are no cases where a single application/sgml
body part outside of a multipart/sgml body part makes sense. Seems
clumsy, but ok...

Dan