At 2009-11-15 21:34 -0800, Mark Wilson wrote:
I need to render Czech language strings
containing diacritics into strings with the
diacritics removed. The Czech alphabet has 16
lower case diacritics and a somewhat smaller set
of upper case diacritics. The strings are expressed in UTF-8.
The encoding is irrelevant to XSLT ... it is
relevant to the XML processor inside your XSLT
processor in order to know what the Unicode
characters are, but XSLT just sees them as
Unicode characters without an encoding.
I do not need to retain case, but I must locate and replace all diacritics.
My only plan so far is construct a gigantic
<xsl:choose> to find strings containing at least
one diacritic. Then I would need a gigantic
<xsl:if> to change each diacritic into its unaccented counterpart.
I wonder if there is a simpler method for
turning, for example, a word like "Safarík" [S,
r, í] into Safarik? Any ideas or suggestions,
If you are using XSLT 2.0, have you tried:
normalize-unicode( $yourString, "NFC" )
... which will return fully formed characters from characters using diacritics?
For example, this converts U+0065 U+0301 into U+00E9.
I hope this helps.
. . . . . . . . Ken
--
Vote for your XML training: http://www.CraneSoftwrights.com/s/i/
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/s/
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson: http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
Video overview: http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
G. Ken Holman mailto:gkholman(_at_)CraneSoftwrights(_dot_)com
Male Cancer Awareness Nov'07 http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers: http://www.CraneSoftwrights.com/legal
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--