In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...
Unfortunately, we do not have "fairly good state of encoding declaration
of XML files". People generate XML documents by XSLT or their own programs,
and fail to specify the correct charset.
That is not a problem. Such files will not be well formed and thus, will
fail toparse.
You are saying that the omission of the charset parameter is a problem,
and that incorrect encoding PIs are not problems. I do not know why.
Many Japanese users have failed to specify correct encoding PIs, and many
Japanese programmers have failed to generate them correctly. I also heard
that users in developing countries copy ISO-8859-1 HTML files and mistakenly
put incorrect meta tags. The same thing will happen to XML. In-band encoding
is not free from errors.
I think that you are not paying attention to other textual format.
Oh I am, but not on this list where it is off topic.
In RFC 2318 (text/css) you co-authored, the charset parameter is described
as below:
The syntax of CSS is expressed in US-ASCII, but a CSS file can
contain strings which may use any Unicode character. Any charset
that is a superset of US-ASCII may be used; US-ASCII, iso-8859-X
and utf-8 are recommended.
RFC 2616 (HTTP) is a draft standard and defined the default as below:
The "charset" parameter is used with some media types to define the
character set (section 3.4) of the data. When no explicit charset
parameter is provided by the sender, media subtypes of the "text"
type are defined to have a default charset value of "ISO-8859-1" when
received via HTTP. Data in character sets other than "ISO-8859-1" or
its subsets MUST be labeled with an appropriate charset value. See
section 3.4.1 for compatibility problems.
Thus, the default value of the charset parameter of text/css is
ISO-8859-1. I know that CSS recommendations are different. But
in the realm of IETF, the default value is ISO-8859-1.
I would
like XML to be a good citizen of the WWW and to establish a good practise
As would I. I don't consider the propogation of known faults to be "good
practice".
Sorry, but I have to trust W3C I18N WG, etc.
The charset parameter
is not a historical requirement. Rather, it is the right solution,
which is just about to take off. I think that we are wasting our
limited resources by repeating old discussion rather than doing more
implemenations.
You consistently fail to address the issue of file system processing of
XML, and instead characterise all opposition to your proposal as "time
wasting". I will be happy to characterise it as that once you have given a
satisfactory response to the questions I pose.
The long-term goal is to make file systems of operating systems to
provide the charset parameter. Encoding declarations are tentative
solutions.
I am not insisting on my proposal. I am insisting on the rough
consensus achived in the past. Since the I18N WG asked the XML Syntax WG
not to change the precedence of the charset parameter, I am extremely
reluctant to do such changes. Up to now, the only change I can support
is to mandate the charset parameter of text/xml.
Since XML processors support UTF-8 and UTF-16, transcoding from Unicode to
legacy encodings does not look very attractive.
I agree that such transcoding is unattractive, but you seem to want to bias
the XML MIME specification to supporting such transcoding whatever the cost
to other sorts of processing.
The only "other sorts of processing" I can imagine is to provide the charset
parameter. I understand that it is not very easy at present, but WWW servers
are getting better. You think that the cost of developing and using XML-aware
transcoders and the cost of inventing different in-band encoding for
different textual formats is not a big deal. I do not agree.
However, something that converts an XML file from 8859-1 to UTF-8 and
leaves the endoding declaration saying 8859-1 is not useful. It has not
generated XML. It has made a thing which will fail to parse.
Since the charset parameter is now authoritative, such documents will parse.
----
MURATA Makoto muraw3c(_at_)attglobal(_dot_)net