Hi Bridger,
You may be able to use xsl:character-map to map characters that are not
transforming correctly into their proper Unicode code points.
I’ve seen plenty of instances where the input files declare one character
encoding but actually contain characters with a different encoding. If this is
what you’re facing, it can be helpful to start by doing an analysis of
character occurrences in the set of input files. You can eliminate characters
in the ISO646-US range straight off, then eliminate character other codes that
transform correctly, and then focus on creating a mapping for the remaining
character codes or character code sequences.
Some Perl modules that can be helpful when dealing with unexpected character
encodings are Encoding::FixLatin, Encode::Guess, and Text::FixEOL.
Cheers,
Vincent
From: Bridger Dyson-Smith bdysonsmith(_at_)gmail(_dot_)com
[mailto:xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com]
Sent: Tuesday, October 11, 2016 3:09 PM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Character encoding/representation from ISO-8859-1 to UTF-8
Hi all,
I'm struggling with a character encoding issue (or a character representation
issue maybe?): I have input XML that looks like this
input.xml
<?xml version="1.0" encoding="iso-8859-1"?>
<documents>
<document>The reality of the effect of natural ventilation in a
residential attic cavity has been the topic of many debates and scholarly
reports since the 1930’s.</document>
</documents>
and I would like to get it to a point where the characters are represented
properly, i.e.
output.xml
<?xml version="1.0" encoding="UTF-8"?>
<documents>
<document>The reality of the effect of natural ventilation in a
residential attic cavity has been the topic of many debates and scholarly
reports since the 1930’s.</document>
</documents>
Thanks to Liam's help on irc and reading through the list archives, it seems
like an identity transform should be the right step towards getting the
representation corrected, but something isn't working (or I have a
misunderstanding somewhere).
If I apply the following identity transform with Saxon HE 9.6.0.7 in oXygen 18:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform<http://www.w3.org/1999/XSL/Transform>"
version="2.0">
<xsl:output encoding="UTF-8" indent="yes"/>
<xsl:template match="/"><xsl:copy-of
select="/"/></xsl:template>
</xsl:stylesheet>
I get the following result:
<?xml version="1.0" encoding="UTF-8"?>
<documents>
<document>The reality of the effect of natural ventilation in a
residential attic cavity has been the topic of many debates and scholarly
reports since the 1930’s.</document>
</documents>
Could someone provide some insight into what I've done wrong here? Any help
would be greatly appreciated.
Best,
Bridger
XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe<-list/194671> (by email<>)
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--