My 2 cents worth:
Issue 1: Proposals for additional parameters in the past
I agree with Tim Bray that it is inappropriate to give the DTD in a
parameter: in particular, because XML content models are basically glorified
assert() statements. DOCTYPE declarations are fine and appropriate where
they are.
But I think we need to have some clearer idea of what the MIME headers are
supposed to do; is there a nice functional demarcation between the headers
and the resource?
In particular, I think we are missing a key distinction that the MIME
content-type is not so much the "type" of a resource, but the type of a
particular *publication* of a resource.
For example, lets take a VML file: if I want to publish it for viewing on a
VML browser, I should be able to send it out with the MIME type image/vml
(or whatever). If I want to publish it out for viewing by generic XML tools,
I send it as text/xml. If I want to publish it for viewing as text, I send
it as text/plain. (Browsers are, of course, free to implement any
application dispatching system they like, which could circumvent my
intentions, but that is life.)
So IMHO the MIME content-types should be geared at helping information
providers publish documents in forms they think are useful. In other words,
it is not the function of the MIME header to describe the resource in
general; the MIME header should describe the document enough for the
particular uses that the information provider has in mind.
With this distinction in mind, we can judge the various parameters suggested
using the simple maxim that "when I publish a resource, I publish it with a
specific use in mind (even if you use it differently)", which means that we
dont need to be overly concerned with "graceful degradatation" or to provide
an elaborate class mechanism (not to deny that the class relations exist).
Furthermore, let me bring up another problem with parameters: the parameters
have to be sourced from somewhere: everytime we have to duplicate
information from inside an XML document into a header, we create the chance
for a mismatch.
If we need fast indexing to elements inside a document, we should invent an
index format: eg
html 23
head 32
meta 44
meta 55
meta 68
to allow (normalized) character indexing into an XML instance. That would be
far more useful in many circumstances than cluttering up the header with
lots of parameters.
Issue 2: Type of XML mime entities
It would be useful to have a text/dtd (and therefore application/xml) MIME
type. text/xml should be used for XML entities (well-formed or not).
Issue 3: UTF-16
RFC 2376 should be revised when charsets for UTF-16 are registered.
Yes. And XML appendix F should be revised simultaneously, so that the
specifications are kept in sync.
Issue 4: Characters .vs. bytes
An XML MIME entity is a sequence of characters as opposed to a
sequence of bytes. RFC 2376 is not really clear about this.
This is the old question. When we discussed it before, didn't we say that:
* a text/xml entity is a sequence of characters
* an application/xml entity is a sequence of bytes?
I hope an application/* is not a sequence of characters.
Issue 5: Packaging
There should be a mechanism for packaging an XML document together
with its stylesheet, catalog, and referenced resources (e.g., links,
external entities).
Relative to the issue of bundling resources with a document, I recently put
up a proposal for Document Resource Links, for XML. See
http://www.ascc.net/~ricko/drlove.htm
Rather than extending the MIME headers with lots of parameter, perhaps just
the URI of a single resource like that would allow greater extensibility.
And the URI acts as a "publication type name".
Issue 6: Ambiguity of CCS conversion
If this is the case, it might make sense to introduce a parameter
"map" to precisely specify which mapping should be used.
This also could have bearing on the PUA (private use area) character
problem, and the problem of corporate character sets (e.g. Hong Kong's
GCCS).
Issue 7: The default of the charset parameter
Chris Lilley's recent proposal to revised RFC2376 is as below:
1) Require explicit charset for overriding the internal encoding
declaration, so if one really wants to re-label a document as US-ASCII
one actually has to send it out as text/xml; charset="US-ASCII"
2) Define the absence of an explicit charset encoding in the MIME
header not as "US-ASCII" but as "use encoding in XML instance" in
accordance with the XML 1.0 Recommendation.
Yes. But...We should encourage the use of application/xml when data
integrity is at a premium, and text/xml when data accessibility is a
premium.
I wonder whether the following is actually what is required to make text/xml
work:
* if the MIME charset and the XML encoding PI concur, the entity is
accepted with the MIME charset;
* if the MIME charset and the XML encoding PI disagree, *and* the
content-type is application/xml, the resource should be accepted using
the XML encoding PI;
* if the MIME charset and the XML encoding PI disagree, *and* the
content-type is text/xml, the user-agent may (at user selection):
- accept the document using the MIME charset (default; status quo in
RFC; indicative of transcoding)
- accept the document using the XML PI (indicative of poor server
labelling);
- follow some policy and heuristics determined by the user-agent
(indicative that data integrity is not the highest priority);
- reject the entity and request it again as application/xml (safest);
where "concurring" takes into account the different defaults.
This is all so complicated, maybe we should just always recommend
application/xml!
Rick Jelliffe
Academia Sinica Computing Center
Taipei, Taiwan