At 2010-05-12 13:49 -0400, David wrote:
I'm writing a XSLT that has to translate XML to plain ascii
text. The XML contains unicode characters, possibly any of them. I
cannot control the authoring so I must handle whatever is thrown at me.
I have a few dozen specially know character translations for things
like 1/4 and degrees unicode symbols.
But I have a need to "catch all" charactors that are not mapped
explicitly (rather then map explicitly the entiure unicode set) and
translate them into something like "<UNKNOWN CHARACTER>"
Any suggestions on how to do this ? I could trivially write a
post-processor to do this (maybe a dozen lines of C or java) but if
there's a feature directly in XSLT I'd love to try that.
Any ideas welcome !
You could try a general match on all text nodes and then using
Unicode code points to accept only ASCII text between code points 32
and 126 (or 127 depending on your need)(and I've included some
diagnostic since that might help the reader):
<xsl:template match="text()">
<xsl:for-each select="string-to-codepoints(.)">
<xsl:value-of select="if ( . ge 32 and . le 127 )
then codepoints-to-string(.)
else concat('<UNKNOWN CHARACTER-',.,'>')"/>
</xsl:for-each>
</xsl:template>
It could be slow, but I think it will be faster than using substring().
Remember there is an ISO DSDL standard that is for validating exactly
this: the use of Unicode characters in an XML document. It is
called CREPDL for "Character Repertoire Description Language":
http://www.iso.org/iso/catalogue_detail.htm?csnumber=51085
http://www.asahi-net.or.jp/~eb2m-mrt/crepdl/ns/structure/1.0/index.xml
http://www.assembla.com/spaces/CrepdlValidatorInFsharp
I understand you are implementing a transformation and
character-level validation doesn't apply, but since you have such a
requirement for using only a subset of characters, there may be a
role for CREPDL in your information/validation flow in addition to
what you are asking for in this post.
I hope this helps.
. . . . . . . . . . . Ken
--
XSLT/XQuery training: after http://XMLPrague.cz 2011-03-28/04-01
Vote for your XML training: http://www.CraneSoftwrights.com/s/i/
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/s/
G. Ken Holman mailto:gkholman(_at_)CraneSoftwrights(_dot_)com
Male Cancer Awareness Nov'07 http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers: http://www.CraneSoftwrights.com/legal
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--