xsl-list
[Top] [All Lists]

RE: [xsl] Re: [xsl] Problems transforming currency values using British pound (£) and Euro (€) signs

2003-10-09 08:57:34
I would be grateful if someone could help me with my problem 
transforming pound signs and Euro signs.

First point: currency symbols are not legal in the picture of
format-number(), though some processors may accept them.
(format-number() was defined by reference to the DecimalFormatSymbols
class of JDK 1.1, and currency symbols were added to the JDK at a later
version than this).

Scenario:
A format string (£#,##0.00) is used to format a number, using 
the built-in
format-number() XSLT function, into a currency value. The 
format value is obtained from the data XML file that is being 
transformed, and looks a little like this:

<Data TotalRows="4">
  <ColumnHeadings>
    <ColumnHeading Number="1">Value</ColumnHeading>
    <ColumnHeading Number="2">Label</ColumnHeading>
  </ColumnHeadings>
  <Rows>
    <Row Number="1" GUID="{F553FCD9-10C1-D511-91D9-000629864A98}">
      <Column Number="1" Type="Number" Format="£#,##0">12</Column>
      <Column Number="2" Type="Standard">A &amp; F GRANT LTD</Column>
    </Row>
    <Row Number="2" GUID="{CD4B980A-9ADC-D511-91EB-000629864A98}">
      <Column Number="1" Type="Number" Format="£#,##0">6</Column>
      <Column Number="2" Type="Standard">ACME Products</Column>
    </Row>
    <Row Number="3" GUID="{C18BEED0-956B-D611-9211-000629864A98}">
      <Column Number="1" Type="Number" Format="£#,##0">87</Column>
      <Column Number="2" Type="Standard">ABAC SERVICES LTD</Column>
    </Row>
    <Row Number="4" GUID="{3C2A26E3-5D51-D611-920B-000629864A98}">
      <Column Number="1" Type="Number" Format="£#,##0">1</Column>
      <Column Number="2" Type="Standard">ABACUS SERVICES LTD</Column>
    </Row>
  </Rows>
</Data>


Using this data, I generate an SVG bar chart, which is 
essentially another XML document.


Problem:
The resulting transformed XML document does not parse 
properly, complaining of illegal characters, when viewed in 
IE. (Error: An invalid character was found in text content. 
Error processing resource bla..)

It should not be possible to produce output from an XML transformation
that can't be parsed by an XML parser. So the question is, how did you
output the result of the transformation, and what did you do to it
before parsing it?

(I might also ask, why did you serialize the output, if all you wanted
to do was to parse it again? Why didn't you just write the output of the
transformation directly to a DOM?)


Theory:
This is as a direct result of the pound (£) sign, since using 
a dollar ($) sign works fine. Research has shown this 
character to be a part of the Latin-1 (extended latin) 
character set, which is not part of ASCII, the default 
characterset used by Windows. (Please correct me if my facts 
aren't accurate). This means that, while the XML above looks 
well formed, the pound character needs to have been converted 
to a Unicode charcacter which, I believe looks something like 
this: £ (that is an accented letter A together with the 
pound sign), although in some editors it still appears as one 
character.

There are many inaccuracies in the above. Firstly, "the default
character set used by Windows" is not ASCII, in fact there is no
default. It all depends on how you have configured Windows and which
software you are using.

Secondly, a Unicode £ sign looks like "£" if you display it correctly
using software that knows it is Unicode. It only looks like £ if you
(a) encode it using the UTF-8 encoding of Unicode, and (b) display it
using software that doesn't understand UTF-8 encoding.


1st Attempt:
So, armed with this theory, I attempted to create the data 
file by allowing the MSXML4 parser to convert the values to 
Unicode for me (at least that's what I hoped for), by setting 
the XML encoding. To do this I tried to set the document 
encoding (<?xml version="1.0" encoding="iso-8859-1"?>) prior 
to building the rest of the document, in the hope that the 
parser would understand that I'm trying to input my values in 
a characterset other than ASCII, in Latin-1 or iso-8859-1.

If the encoding of the file is iso-8859-1, which it probably is if you
have a Western European version of Windows, and a run-of-the-mill text
editor, and if you avoid the special characters that Microsoft has added
to 8859-1, then you should put this XML declaration at the start of the
file. If it isn't, then you shouldn't. 

 I 
used the VB syntax:

  objDOM.appendChild objDOM.createProcessingInstruction("xml",
"version=""1.0"" encoding=""iso-8859-1""")

Firstly, the XML declaration is not a processing instruction. Secondly,
character encoding applies only to an unparsed document. Once the data
is in a DOM, it consists of characters not bytes, and the encoding is
none of your concern any more. Once the data has been parsed, if the
encoding was wrongly labelled then there is no way of undoing the
damage.

2nd Attempt:
The next thing I attempted was to create the data XML file 
without changing the encoding and, before consuming the XML 
contents, append the declaration to the front of the XML string.

  strXML = "<?xml version=""1.0"" encoding=""iso-8859-1""?>" 
& objDOM.xml

Again, the results of the transform failed, but this time the 
data file contents were visible in IE without causing error.

You should'nt be mucking around with the output of the serializer. It's
the serializer's job to output an XML declaration that reflects the
encoding it is actually using.


3rd Attempt:
Using the method described in the 1st attempt, I then called 
the the save method of the DOM to save the contents to a 
file. This gave me the same results as mentioned in my 2nd 
attempt, in that the data was visible in IE without causing 
error, but the results of the transform still failed.

Either your original XML file is incorrectly encoded, or you are doing
something odd to the output of the transformation before passing it back
into an XML parser.

Michael Kay


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list