ietf-xml-mime
[Top] [All Lists]

Mixed-format and Unpacking Expectations

1999-04-10 13:50:10

I'd like to add remarks on topics I haven't seen dealt with at any
length yet: mixed format compound documents and expectations for
MIME unpacking.  Here are some quotes of what has been said:

| MURATA Makoto, Paul Hoffman, Frank Dawson, Jim Whitehead
| 1. Problem statement
| 
| We would like the MIME parser to be able to dispatch different sorts
| of XML documents to different applications, such as specialized
| programs that handle just one type of XML document.  Because MIME
| parsers do not look inside the MIME parts, identifiying the sort of
| documents must be done in the MIME headers.  However, neither text/xml
| nor application/xml allow such information.
| ...
| (3) A new parameter "externalid" for text/xml and application/xml
| 
| This parameter specifies the externalID from the DOCTYPE of the XML
| document (if the DOCTYPE is present).  Examples would be:
| ...
| Note: None of the above proposals handle non-monolithic XML documents 
| very well, 
| since different islands of non-monolithic XML documents belong to different 
| namespaces and thus different schemata. 
| ===================
| Later, Murata Makoto wrote:
| Issue 5: Packaging
| 
| There should be a mechanism for packaging an XML document together
| with its stylesheet, catalog, and referenced resources (e.g., links,
| external entities).  One possibility is MHTML.
| ===================
| Larry asked:
| In the case where you're allowed to have a document that mixes
| traditional HTML, MathML, Vector Graphics ML, etc., are these
| separate "applications" or are they one "application" ("renderable
| XML document")?
| ====================

Mixed-format compound documents.

I have imagined complex documents that are not pure XML.  Such
a document would have a root entity (cover page, table of contents)
that gives access to others (which is why it's a document rather 
merely a web).  That root entity could be PDF, XML, Word, plain
text, CGM, you name it.  Other entities could be XML, Word, PDF,
etc. and on and on.  I want to be able to package up such a document
in MIME, and I think of "nonmonolithic" XML documents as a special
case of this format-generic complex document.  I'd like to solve
the general problem AND solve the XML problem together.  I've made
several proposals for doing so:  "Package or Perish", an SGML '97 
paper, and "MIME Multipart/Related for CBL", possibly accessible
s.v. "Related Documents" at 

  http://www.veosystems.com/xml/cbl/cbl-1.1/doc/index.html

In both cases I used the basis of the MHTML work, but tried to avoid
putting information about relations of parts in the MIME headers, so
that you could discard the MIME packaging without loss of information.
I don't have any attachment to the details of either of those proposals:
the week before the SGML '97 conference my colleagues and I worked up
half a dozen combinations of MIME semantics to do the job, and any 
proposal that works is fine by me.

I invented a text/x-compounddoc subtype for mixed-format compound
documents, perhaps pointlessly:  is it the opinion of this group that
the root entity's MIME type is what should be used to label the whole
(which is related to what Larry was asking)?  or maybe the MIME type
of the manifest (see next)?


Expectations for MIME unpacking.

There may be XML documents composed of very many pieces:

My-enormous-catalogue-container
        Catalogue-entry-1
        Catalogue-entry-2
        Catalogue-entry-3
        ...
        Catalogue-entry-100,000

Each of these catalogue entries may have an identifier, which could
be listed in a manifest (in XML, say) that is the first body part of 
a multipart MIME message.  The recipient may know he is interested only 
in Catalogue-entry-98,256.  It seems to me that it might be more
efficient to obtain that part by extracting and parsing the manifest and 
then searching for the MIME header of the wanted body part (thus making
recursive use of the MIME packaging) rather than having the MIME unpacker 
unpack all of the MIME message first.  I'm no MIME expert, so I don't know 
if that's reasonable, but I am uneasy about assuming that unpacking should 
be done before any subsequent processing.  (Perhaps no such assumption has
been made ...)

regards, Terry

Terry Allen                             Commerce One, Inc.
Business Language Designer              1600 Riviera Ave., Suite 200
Advanced Technology Group               Walnut Creek, Calif., 94596
tallen[at]sonic.net


<Prev in Thread] Current Thread [Next in Thread>