In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...
MURATA Makoto wrote:
In message "Re: Some text that may be useful for the update of RFC 2376",
Martin J. Duerst wrote...
>CSS is served as text/css. XSL is XML. VBScript and JavaScript may be
>served as application/... If they don't have a 'charset' parameter,
>and they don't have any internal way to indicate the encoding,
>that's the problem of these registrations, not our problem.
Are you saying that each format should invent their own rules for
indicating the charset? My understanding was (and still is) that
you as an I18n guy at W3C are promoting a single generalized solution
for all textual formats.
Are you saying that each transport protocol (which formally inclues direct
filesystem access) should have their own, sometimes contradictory,
overrides and defaults and assumptions?
RFC 2130 clearly recommends the use of the MIME header and its charset
parameter.
Or that we should take the current,
lowest-common-denominator, fails far more often than it works charset
parameter of two particular protocols (each of which has a different
default, ands neither of which is implemented consistently)
As for text/xml and application/xml, the default does not depend on
the protocol.
and attempt to
stretch this to make loose and wooly the current, fairly good state of
encoding declaration of XML files?
Unfortunately, we do not have "fairly good state of encoding declaration
of XML files". People generate XML documents by XSLT or their own programs,
and fail to specify the correct charset. Encoding PIs are not bad when
the MIME header is absent. But mistakes do happen.
Several people have pointed out that I am focussing on XML here. I would
refer them to the name and scope of the mailing list.
I think that you are not paying attention to other textual format. I would
like XML to be a good citizen of the WWW and to establish a good practise
for every textual format.
Incidentally, XML is probably not best described as a textual format. It is
a data format, which can among other things be used to describe
international text. I am aware that the text/* media types have some
historical requirements regarding 'character set'; this is sufficient that
my opinion is that text/* should not be used for XML in general.
Application/xml has no such problems (though it seems that people propose
to propogate these problems there).
I think that many XML documents are readable for casual users and that
the top-level type "text" is most appropriate. The charset parameter
is not a historical requirement. Rather, it is the right solution,
which is just about to take off. I think that we are wasting our
limited resources by repeating old discussion rather than doing more
implemenations.
It is possible for example to take a payload of image/svg-xml and alter it
from UTF-16 to ISO-8859-15 (this would entail rewriting the encoding
declaration and insertion of NCRs for any characters outside the repertoire
of 8859-15). I would be most upset, as would every decoder on the planet,
if the same conversion was performed on image/png.
Since XML processors support UTF-8 and UTF-16, transcoding from Unicode to
legacy encodings does not look very attractive. What is needed is the
other way around: conversion from legacy encodings to Unicode. Such
transcoders
do not need character references by numbers.
----
MURATA Makoto muraw3c(_at_)attglobal(_dot_)net