Re: Finishing the XML-tagging discussion

At 08:29 PM 3/16/00 -0800, Paul Hoffman / IMC wrote:

The original problem is: how do I have my MIME handler automatically know 
that it should hand a body part that has an unknown tag on it to the 
generic XML processor?


You seem to be assuming in this case that the handler already has the
document, and that the only use of MIME types is for convenience in
processing documents you've already retrieved and know something about.

Also, 'unknown tag'?  Do you mean 'unknown content-type' here?  I don't
think we're talking about using MIME to dispatch handling of document
fragments.  (At least I hope not!)

We have explored many, many alternatives in detail. They all have 
drawbacks, some of them severe. However, there is a simple solution that 
involves no changes to any protocol *and* will stop this discussion so we 
can move on with more interesting aspects of XML.


This non-solution makes many of the more interesting aspects of XML
difficult to use.  No changes to any protocol in return for a hobbled
infrastructure is not a solution I will accept happily.

Proposed solution: every time the MIME handler comes across an unknown 
media type, it looks in the body part and sees if it is XML. If it is XML, 
add this media type to the "hand to the generic XML processor" list. If it 
is not XML, add this media type to the "don't hand to the generic XML 
processor" list. If you are really paranoid about missing something, clear 
the latter list every so often.


This is the fallback case which applications are going to have to use in
the event that the -xml suffix is rejected, but I strongly doubt that it
either the best way to do it or the way that has the lowest cost.

If we assume that there will only be a relatively small number of XML media
types created, the costs of trying document types to find out if they
contain XML, in order to have some idea whether to try them again, is
minimal.  You load a document here and there and figure out that it's
useless junk or potentially valuable, and you keep track of the name.

As the number of document types grows, however, this processing and the
associated list become an ever growing burden.  In situations where human
users are managing MIME types across more than one program, keeping track
of this list (which of these are XML?) will grow more difficult quite
rapidly.  Programs and networks will both have to spend significant
resources, especially in cases (like search engines) where a generic XML
processor is exploring millions of documents.  Retrieving documents to test
if they're in a usable format becomes a very bad idea as the number of
document types grows.

Also, there are lots of cases where both application-specific and
XML-generic may be appropriate.  Crazy people like me who hand-code their
documents may want to be able to choose SVG viewing in a browser and
editing in an XML-based environment rather than an SVG drawing program.
It'll be much easier to set up those mappings if the description of the
documents - typically the MIME content-type - describes both possibilities.

If this growing and distributed burden is more attractive than a 4-byte
naming convention that doesn't interfere with existing processing, then I
suppose we should drop the suffix, and find out how popular this
non-approach proves to be in a couple of years.  At that point, it will be
very difficult to fix things.  I continue to argue that the suffix is a
remarkably low-cost solution with significant benefits.

Done. No changes to the naming schemes needed, no hoping that if the naming 
scheme changes that all future media types follow it, no worrying about 
'x-' names getting it wrong. Making these lists won't be that hard; there 
are only 330 types in the IANA registry now, and that includes all of the 
'vnd.' names.


There's never been a syntax for creating your own document types (apart
from SGML, which never caught fire) without going through a complex
process, whether it was a standards- or vendor- based process. I suspect
there are at least 330 XML vocabularies out there right now, in various
stages of development, each of which could probably benefit from a
registered MIME type.  As for the x- names, I think they might take the
hint as registered names go out and the convention becomes part of XML best
practices.

Past performance is no guarantee of future returns.  

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com