ietf-xml-mime
[Top] [All Lists]

Re: Some text that may be useful for the update of RFC 2376

2000-03-22 12:54:50
MURATA Makoto wrote:

UTF-16le and UTF-16be cannot be used for XML.  XML mandates
the BOM for utf-16.  Meanwhile, utf-16le and utf-16be cannot
have the BOM.  More about this, see RFC 2781.

I do not understand this from the text of XML 1.0.  Clause 4.3.3 only says
that if there is no encoding declaration, then either:

        a BOM is present, and the encoding is UTF-16, or

        no BOM is present, and the encoding is UTF-8.

If a proper encoding declaration is present, then any charset may be
used; however, parsers are only required to handle UTF-8 and UTF-16.
(In practice, all parsers known to me also accept US-ASCII and ISO-8859-1.)

For example, a file beginning with the characters

        <?xml version='1.0' encoding='x-focs'?>

encoded in Finagle's Own Character Set is perfectly legal, and will be parsed
successfully by any parser with an x-focs conversion table.  This is true even 
if
x-focs is a multi-byte character set.

I see absolutely no reason why UTF-16BE and UTF-16LE should be excluded from
the list of acceptable charsets.  It is true that Appendix F claims that a text 
beginning with the bytes 00 3C 00 3F or 3C 00 3F 00 is "strictly speaking,
in error", but Appendix F is marked "non-normative", and this text is
qualified in E44 anyhow.

I see no reasons for preserving byte sequences.  We only have to
preserve XML information sets.

Almost, since strictly speaking the charset is part of the information set.

Existing programming languages do not support Unicode very well, as
I see it.

Except Java, Javascript, Ada 95, Dylan ....
 
-- 

Schlingt dreifach einen Kreis um dies! || John Cowan 
<jcowan(_at_)reutershealth(_dot_)com>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)

<Prev in Thread] Current Thread [Next in Thread>