Larry Masinter wrote:
Why is transcoding nearly certain to be wrong with XML?
First, transcoding XML is technically difficult, because you can't just
do a dumb byte-level job, you have to be careful to also take care of
the BOM & encoding declaration. Second, given the wide range of
encodings supported by popular deployed XML software, it is rarely the
case that transcoding offers any benefit. Something that is technically
difficult (i.e. easy to get wrong) and offers little practical benefit
is nearly certain to be wrong.
Or, to put it another way, why not limit the use
of text/xml to XML instances for which transcoding
is certain not to be wrong, and for which US-ASCII
is acceptable (because the XML uses numeric character
references or character entities, is only used to
code a limited schema with numeric data, etc.?)
This is plausible I guess, except for if you avoid text/*, you just
don't need to worry whether they used entities the right way or what
kind of data it is. Essentially, application/* with no charset is
highly robust. Using text/* or adding a charset reduces that robustness
and is just bad practice.
Limiting the scope is less radical than deprecating.
True, but practices that cause problems while offering no practical
benefits should be deprecated.
--
Cheers, Tim Bray
(ongoing fragmented essay: http://www.tbray.org/ongoing/)