Re: transcoding nearly certainly wrong?


Larry Masinter wrote:

Why is transcoding nearly certain to be wrong with XML?

First, transcoding XML is technically difficult, because you can't justdo a dumb byte-level job, you have to be careful to also take care ofthe BOM & encoding declaration. Second, given the wide range ofencodings supported by popular deployed XML software, it is rarely thecase that transcoding offers any benefit. Something that is technicallydifficult (i.e. easy to get wrong) and offers little practical benefitis nearly certain to be wrong.

Or, to put it another way, why not limit the use
of text/xml to XML instances for which transcoding
is certain not to be wrong, and for which US-ASCII
is acceptable (because the XML uses numeric character
references or character entities, is only used to
code a limited schema with numeric data, etc.?)

This is plausible I guess, except for if you avoid text/*, you justdon't need to worry whether they used entities the right way or whatkind of data it is. Essentially, application/* with no charset ishighly robust. Using text/* or adding a charset reduces that robustnessand is just bad practice.

Limiting the scope is less radical than deprecating.

True, but practices that cause problems while offering no practicalbenefits should be deprecated.

--
Cheers, Tim Bray
        (ongoing fragmented essay: http://www.tbray.org/ongoing/)