xsl-list
[Top] [All Lists]

RE: [xsl] Does XSLT contain an easy means of determining if a string contains a diacritic?

2009-11-16 04:23:54

Yes, there is a better way. You can use normalize-unicode() to turn the
string into decomposed normal form, in which all the diacritics become
separate characters, and then you can use replace() to get rid of the
diacritics:

replace(normalize-unicode($in, 'NFD'), '\p{IsCombiningDiacriticalMarks}',
'')

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay  

-----Original Message-----
From: Mark Wilson [mailto:mark(_at_)knihtisk(_dot_)org] 
Sent: 16 November 2009 05:35
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Does XSLT contain an easy means of determining 
if a string contains a diacritic?

Hi,
I need to render Czech language strings containing diacritics 
into strings with the diacritics removed. The Czech alphabet 
has 16 lower case diacritics and a somewhat smaller set of 
upper case diacritics. The strings are expressed in  UTF-8. I 
do not need to retain case, but I must locate and replace all 
diacritics.

My only plan so far is construct a gigantic <xsl:choose> to 
find strings containing at least one diacritic. Then I would 
need a gigantic <xsl:if> to change each diacritic into its 
unaccented counterpart.

I wonder if there is a simpler method for turning, for 
example, a word like "Safarík" [S, r, í] into Safarik? Any 
ideas or suggestions, Thanks, Mark 



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--