Hmm, the TAG is pretty convinced that 3023 needs to change, so maybe Dan
or Chris or TimBL could take this up internally. I disagree that this
should be frozen at the moment, since the TAG is quite likely to publish
a document saying "RFC 3023 is wrong".
Since the TAG and the W3C team share some members, please have some
internal discussion. Our hands are tied and are also requested to
take some action. Please do not blame us.
As for the charset parameter, I am still uneasy to disallow or
deprecate it.
...
I think I provided a detailed explanation of why the charset is in fact
actively harmful in the context of XML. If you're not convinced it
would be helpful if you could address those points. If you already
have, my apologies, perhaps you could give a pointer.
First, I agree on your request beginning "it should be made clear
that nobody sending a media-type should send a charset for an XML
media-type unless it REALLY REALLY KNOWS what it's sending". Is
this an acceptable position?
You explained that an ad-hoc mechanism (i.e., encoding dcls) almost perfectly
work
for XML, and I agree. The above position implies that (1) the ad-hoc mechanism
is allowed, (2) the generic mechanism is recommended but optional, and (3) if
specified, the generic mechanism takes precedence. You provided a detailed
explanation of (1), but your argument against (3) is not persuasive.
I read [3] and while I agree with much of it, it's obviously far too
late to change the XML encoding declaration.
True. But what happens to upcoming non-XML formats?
For the moment, I think
that the architecturally-sound position is, for Web data formats, either
(a) use XML, or (b) use the charset parameter.
It would be great if the TAG finding document explicitly state this. If
we reach a consensus on this principle, my worries will be lessened.
I am worried because I think the TAG is trying to deprecate the generic
mechanism
without establishing any principle.
(1) non-self-describing data formats should rely on the
charset parameter, and
(2) self-describing data formats should introduce their own
mechanism for specifying charsets.
I'll review the webarch doc, I suspect we haven't thought closely enough
about this.
Please think about this. Frankly, as for this issue, I think that the
TAG does not have an architecture but only has an ad-hoc solution. Does
the I18N WG agree with the position of the TAG?
I agree, but for XML formats, I still think the charset parameter is
actively harmful and should be deprecated or even forbidden. This is
orthogonal to the larger question you (correctly) raise, of charset
detection for non-XML formats.
I do not think that this is orthogonal. This is the point. Should we
deprecate the generic mechanism in the particular case that an
ad-hoc mechanism works better? WWW programmers are already tired of
ad-hoc mechanisms.
CHeers,
Makoto