In message "RE: Some text that may be useful for the update of RFC 2376",
Langer, Paul wrote...
We are developing an XML-database that gets input via HTTP.
In a previous release we implemented RFC 2376 correctly (for
media type text/xml we used the value of the charset parameter to
determine the encoding of input documents; if this parameter was
omitted we used the default "us-ascii").
We are all aware of this problem. We are also aware of transcoders
which changes the charset parameter but does not rerwrite encoding
declarations.
In Japan, we have a very interesting problem. We have XML, XSL,
Javascript, VBScript, CSS, and HTML, which reference to each other. Some
formats provide inline declarations. Other formats do not. IE 5.0
appear to assume that if an HTML document is in UTF-16, anything
referenced from this HTML is also in UTF-16. Unfortunately, even
when XML, XSL, and CSS are all in Shift_JIS, an internally generated
HTML is in UTF-16. Thus, we have data corruption.
I have come to believe that we need a single solution for every format.
The charset parameter is such a solution. We should not try to bend
specifications only to invent an ad-hoc solution for a particular format.
Let us strongly request internationalized WWW browsers & servers to
Microsoft and Netscape.
Cheers,
----
MURATA Makoto muraw3c(_at_)attglobal(_dot_)net