Re: Requesting a revision of RFC3023


MURATA Makoto wrote:

First, Simon and I were asked by the W3C team not to take any action
on RFC 3023.  This is because the MIME type registration procedure was
expected to change (see [1] and [2]). So, Simon, Dan, and I can't doanything right now.

Hmm, the TAG is pretty convinced that 3023 needs to change, so maybe Danor Chris or TimBL could take this up internally. I disagree that thisshould be frozen at the moment, since the TAG is quite likely to publisha document saying "RFC 3023 is wrong".

As for the charset parameter, I am still uneasy to disallow or
deprecate it.  But I agree to make "clear that nobody sending a
media-type should send a charset for an XML media-type unless it
REALLY REALLY KNOWS what it's sending," and to deprecate text/xml not
because the charset parameter is harmful but because most XML is not
text for casual users.

I think I provided a detailed explanation of why the charset is in factactively harmful in the context of XML. If you're not convinced itwould be helpful if you could address those points. If you alreadyhave, my apologies, perhaps you could give a pointer.

I have repeatedly asked (e.g., [3]) what is the position of the TAG on
charset detection for non-XML formats.  The latest version of the TAG
finding document "Client handling of MIME headers" appears to
recommend:

I read [3] and while I agree with much of it, it's obviously far toolate to change the XML encoding declaration. For the moment, I thinkthat the architecturally-sound position is, for Web data formats, either(a) use XML, or (b) use the charset parameter. I'm generally in favorof a general-purpose encoding-detection scheme such as you propose, butI'm pessimistic about getting it widely deployed for legacy formats.

        (1) non-self-describing data formats should rely on the
            charset parameter, and
        (2) self-describing data formats should introduce their own
            mechanism for specifying charsets.

I'll review the webarch doc, I suspect we haven't thought closely enoughabout this.

As far as I know, the charset parameter is the only generic mechanism. Iknow the charset parameter is not working well, but I do not see any othergeneric mechanisms.

I agree, but for XML formats, I still think the charset parameter isactively harmful and should be deprecated or even forbidden. This isorthogonal to the larger question you (correctly) raise, of charsetdetection for non-XML formats.


--
Cheers, Tim Bray
        (ongoing fragmented essay: http://www.tbray.org/ongoing/)