Re: Requesting a revision of RFC3023

Hmm, the TAG is pretty convinced that 3023 needs to change, so maybe Dan 
or Chris or TimBL could take this up internally.  I disagree that this 
should be frozen at the moment, since the TAG is quite likely to publish 
a document saying "RFC 3023 is wrong".


Since the TAG and the W3C team share some members, please have some 
internal discussion.  Our hands are tied and are also requested to 
take some action.  Please do not blame us.

As for the charset parameter, I am still uneasy to disallow or
deprecate it.

...


I think I provided a detailed explanation of why the charset is in fact 
actively harmful in the context of XML.  If you're not convinced it 
would be helpful if you could address those points.  If you already 
have, my apologies, perhaps you could give a pointer.


First, I agree on your request beginning "it should be made clear 
that nobody sending a media-type should send a charset for an XML 
media-type unless it REALLY REALLY KNOWS what it's sending".  Is 
this an acceptable position?

You explained that an ad-hoc mechanism (i.e., encoding dcls) almost perfectly 
work 
for XML, and I agree.  The above position implies that (1) the ad-hoc mechanism 
is allowed, (2) the generic mechanism is recommended but optional, and (3) if
specified, the generic mechanism takes precedence.  You provided a detailed 
explanation of (1), but your argument against (3) is not persuasive.

I read [3] and while I agree with much of it, it's obviously far too 
late to change the XML encoding declaration.


True.  But what happens to upcoming non-XML formats?

For the moment, I think 
that the architecturally-sound position is, for Web data formats, either 
(a) use XML, or (b) use the charset parameter.


It would be great if the TAG finding document explicitly state this.  If 
we reach a consensus on this principle, my worries will be lessened.  
I am worried because I think the TAG is trying to deprecate the generic 
mechanism 
without establishing any principle.

    (1) non-self-describing data formats should rely on the
            charset parameter, and
    (2) self-describing data formats should introduce their own
        mechanism for specifying charsets.


I'll review the webarch doc, I suspect we haven't thought closely enough 
about this.


Please think about this.  Frankly, as for this issue, I think that the 
TAG does not have an architecture but only has an ad-hoc solution.   Does 
the I18N WG agree with the position of the TAG?

I agree, but for XML formats, I still think the charset parameter is 
actively harmful and should be deprecated or even forbidden.  This is 
orthogonal to the larger question you (correctly) raise, of charset 
detection for non-XML formats.


I do not think that this is orthogonal.  This is the point.  Should we 
deprecate the generic mechanism in the particular case that an 
ad-hoc mechanism works better?  WWW programmers are already tired of 
ad-hoc mechanisms.

CHeers,

Makoto