Re: Some text that may be useful for the update of RFC 2376

In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...



MURATA Makoto wrote:


In message "Re: Some text that may be useful for the update of RFC 2376",
Martin J. Duerst wrote...
 >CSS is served as text/css. XSL is XML. VBScript and JavaScript may be
 >served as application/... If they don't have a 'charset' parameter,
 >and they don't have any internal way to indicate the encoding,
 >that's the problem of these registrations, not our problem.

Are you saying that each format should invent their own rules for
indicating the charset?  My understanding was (and still is) that
you as an I18n guy at W3C are promoting a single generalized solution
for all textual formats.


Are you saying that each transport protocol (which formally inclues direct
filesystem access) should have their own, sometimes contradictory,
overrides and defaults and assumptions?


RFC 2130 clearly recommends the use of the MIME header and its charset 
parameter.

Or that we should take the current,
lowest-common-denominator, fails far more often than it works charset
parameter of two particular protocols (each of which has a different
default, ands neither of which is implemented consistently)


As for text/xml and application/xml, the default does not depend on 
the protocol.

and attempt to
stretch this to make loose and wooly the current, fairly good state of
encoding declaration of XML files?


Unfortunately, we do not have "fairly good state of encoding declaration 
of XML files".  People generate XML documents by XSLT or their own programs, 
and fail to specify the correct charset.  Encoding PIs are not bad when 
the MIME header is absent.  But mistakes do happen.

Several people have pointed out that I am focussing on XML here. I would
refer them to the name and scope of the mailing list.


I think that you are not paying attention to other textual format.  I would 
like XML to be a good citizen of the WWW and to establish a good practise 
for every textual format.

Incidentally, XML is probably not best described as a textual format. It is
a data format, which can among other things be used to describe
international text. I am aware that the text/* media types have some
historical requirements regarding 'character set'; this is sufficient that
my opinion is that text/* should not be used for XML in general.
Application/xml has no such problems (though it seems that people propose
to propogate these problems there).


I think that many XML documents are readable for casual users and that 
the top-level type "text" is most appropriate.  The charset parameter 
is not a historical requirement.  Rather, it is the right solution, 
which is just about to take off.  I think that we are wasting our 
limited resources by repeating old discussion rather than doing more 
implemenations.

It is possible for example to take a payload of image/svg-xml and alter it
from UTF-16 to ISO-8859-15 (this would entail rewriting the encoding
declaration and insertion of NCRs for any characters outside the repertoire
of 8859-15). I would be most upset, as would every decoder on the planet,
if the same conversion was performed on image/png.


Since XML processors support UTF-8 and UTF-16, transcoding from Unicode to 
legacy encodings does not look very attractive.  What is needed is the 
other way around: conversion from legacy encodings to Unicode.  Such 
transcoders 
do not need character references by numbers.

----
MURATA Makoto  muraw3c(_at_)attglobal(_dot_)net