ietf-xml-mime
[Top] [All Lists]

Re: Some text that may be useful for the update of RFC 2376

2000-03-23 02:21:17
In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...


Unfortunately, we do not have "fairly good state of encoding declaration
of XML files".  People generate XML documents by XSLT or their own programs,
and fail to specify the correct charset.

That is not a problem. Such files will not be well formed and thus, will
fail toparse.

You are saying that the omission of the charset parameter is a problem, 
and that incorrect encoding PIs are not problems.  I do not know why.  

Many Japanese users have failed to specify correct encoding PIs, and many   
Japanese programmers have failed to generate them correctly.  I also heard 
that users in developing countries copy ISO-8859-1 HTML files and mistakenly 
put incorrect meta tags.  The same thing will happen to XML.  In-band encoding 
is not free from errors.

I think that you are not paying attention to other textual format. 

Oh I am, but not on this list where it is off topic.

In RFC 2318 (text/css) you co-authored, the charset parameter is described 
as below:

       The syntax of CSS is expressed in US-ASCII, but a CSS file can
       contain strings which may use any Unicode character. Any charset
       that is a superset of US-ASCII may be used; US-ASCII, iso-8859-X
       and utf-8 are recommended.

RFC 2616 (HTTP) is a draft standard and defined the default as below:

   The "charset" parameter is used with some media types to define the
   character set (section 3.4) of the data. When no explicit charset
   parameter is provided by the sender, media subtypes of the "text"
   type are defined to have a default charset value of "ISO-8859-1" when
   received via HTTP. Data in character sets other than "ISO-8859-1" or
   its subsets MUST be labeled with an appropriate charset value. See
   section 3.4.1 for compatibility problems.

Thus, the default value of the charset parameter of text/css is 
ISO-8859-1.  I know that CSS recommendations are different.   But 
in the realm of IETF, the default value is ISO-8859-1.

I would
like XML to be a good citizen of the WWW and to establish a good practise

As would I. I don't consider the propogation of known faults to be "good
practice".

Sorry, but I have to trust W3C I18N WG, etc. 


 The charset parameter
is not a historical requirement.  Rather, it is the right solution,
which is just about to take off.  I think that we are wasting our
limited resources by repeating old discussion rather than doing more
implemenations.

You consistently fail to address the issue of file system processing of
XML, and instead characterise all opposition to your proposal as "time
wasting". I will be happy to characterise it as that once you have given a
satisfactory response to the questions I pose.

The long-term goal is to make file systems of operating systems to 
provide the charset parameter.  Encoding declarations are tentative 
solutions.

I am not insisting on my proposal.  I am insisting on the rough 
consensus achived in the past.  Since the I18N WG asked the XML Syntax WG 
not to change the precedence of the charset parameter, I am extremely 
reluctant to do such changes.  Up to now, the only change I can support 
is to mandate the charset parameter of text/xml.

Since XML processors support UTF-8 and UTF-16, transcoding from Unicode to
legacy encodings does not look very attractive. 

I agree that such transcoding is unattractive, but you seem to want to bias
the XML MIME specification to supporting such transcoding whatever the cost
to other sorts of processing.

The only "other sorts of processing" I can imagine is to provide the charset 
parameter.  I understand that it is not very easy at present, but WWW servers 
are getting better.  You think that the cost of developing and using XML-aware 
transcoders and the cost of inventing different in-band encoding for 
different textual formats is not a big deal.  I do not agree.


However, something that converts an XML file from 8859-1 to UTF-8 and
leaves the endoding declaration saying 8859-1 is not useful. It has not
generated XML. It has made a thing which will fail to parse.

Since the charset parameter is now authoritative, such documents will parse.

----
MURATA Makoto  muraw3c(_at_)attglobal(_dot_)net

<Prev in Thread] Current Thread [Next in Thread>