xsl-list
[Top] [All Lists]

Re: change stylesheet encoding

2004-05-14 15:46:18
Mike Ferrando wrote:

What is wierd, is that a friend of mine tried to change the encoding
in his editor of one of my stylesheets. But it would not let him. The
editor said that the stylesheet had characters in it that violated
the encoding choice. My question comes from his experience with my
stylesheet.

Many, if not most, non-Unicode encodings do not have a 100% coverage of the Unicode characters. This means that a Unicode document may contain characters that do not exist in some target non-Unicode encoding. That was probably the case here.

However, the numeric character references should not have caused this.

That is, at the level of the encoding of the *file* the numeric character references are interpreted as the characters "&", "#", "x", etc., not the (abstract) character they represent in XML land. These characters are all within the base ASCII range and therefore should be in every encoding you might want to use.

It is only in the XML parser that the reference is converted from the sequence of characters "&", "#', "x", etc. to a single *Unicode* character.

It's important to remember that, regardless of how the bytes of the XML file are written to disk (the character encoding of the file), once parsed, all XML documents are, by definition, sequences of Unicode characters. Thus, even if I defined "Eliot's personal encoding" and put all my XML files in it and hacked my favorite parser to understand it, once parsed, the XML data provided to other applications by the XML parser would be Unicode characters.

Cheers,

Eliot
--
W. Eliot Kimber
Professional Services
Innodata Isogen
9030 Research Blvd, #410
Austin, TX 78758
(512) 372-8122

eliot(_at_)innodata-isogen(_dot_)com
www.innodata-isogen.com



<Prev in Thread] Current Thread [Next in Thread>