Re: Some text that may be useful for the update of RFC 2376

In message "Re: Some text that may be useful for the update of RFC 2376",
Rick Jelliffe wrote...

HTTP already has the accept-charset field.  I do not understand your claim.


If I am using a DOM parser, I cannot ask it "what encodings do you
support?" If I am using SAX in Java, I can assume that the encodings
underlying Java are the ones available. I don't recall any C or C++ XML
parser that exposes this information: I don't think Expat does, for
example.


Although such information is described in their manuals, I do not 
think that they have any APIs for "what encodings do you
support?".  I agree.

If my browser cannot ask its XML processor "what encodings do you
support?" in order to perform content negotiation for XML, then either the
poor user must configure it themselves or the HTTP code has to take on the
responsibility for providing transcoding services itself (perhaps not a
bad thing for the future).  And configuration has to be done
application-by-application: for example James Clark's vanilla Expat did
not accept Big5, so every XML application built on it was pretty unusable
for Traditional Chinese here.


Ideally, XML processors should silently provide the accept-charset field.  

On top of document entities, an XML processor may silently fetch external 
parsed entities, external parameter entities, and external DTD subsets.  
(I know expat doesn't, but other parsers do.)  Since application 
programmers cannot control such fetching, the best solution is hardcode 
the accept-charset field in the XML processor.  Certainly, the person who 
writes the XML processor knows which encoding is supported.  (It would be 
great if we can register callback routines for unsupported charsets.)


Even if you are using DOM (or
SAX) it is quite possible that the system integrator has chosen to use a
different implementation from the one which the software developer used.
So you cannot ask DOM, and the programmer cannot be sure which
implementation is being used.


Right.  But we can always assume that the same UCS characters will be 
received.

So it seems to me that content negotiation of character encoding for XML
is a bit of a myth: it requires that the user test applications rather
than it being transparent.


If content negotiation is hardcoded in XML processors, application programmers 
do not have to worry.

That is an unreasonable and unworkable
requirement. At the moment, the browser has to guess which encodings are
available, or the poor user has to test if the local encoding is
supported. (I suppose systems could also have some automated system which
requested a big5 XML file and then tried to parse it. Not really
elegant. Presumably the XML file would have to be sourced internally. )


I am afraid that I do not fully understand your claim.  Could you 
try again?

For content negotiation of MIME types, a browser knows which content-types
have handlers. But it doesn't know this information for character-encoding
for the XML applications it has. That is why I


Probably, you sent this mail before you finished the last sentence.  

----
MURATA Makoto  muraw3c(_at_)attglobal(_dot_)net