Vladimir,
I have missed the original post, but the effect of diacritics is quite
different from language to language. Sometimes it is "only" an accent, in other
languages it changes the sound and sometimes meaning of a character. What is a
"Western" language? If you think of European languages, there are some that do
not use ASCII characters at all (Cyrillic, Greek) and your method will not
work.
So I would just drop them or replace them with an underscore. Saves a lot of
energy :-)
- Michael Müller-Hillebrand
Am 07.01.2010 um 14:27 schrieb Vladimir Nesterovsky:
Hello!
Proceeding with my original question.
Is there a way to decompose characters like:
æ 'LATIN SMALL LETTER AE' (U+00E6)
into a separate letters?
Are there many such characters derived from Latin (I'll be calling replace()
if it's only one or two)?
Thanks.
--
Vladimir Nesterovsky
http://www.nesterovsky-bros.com/
I need to convert a string into an identifier.
Earlier I was using the following function:
Now, I have to build a name with only containing [A-Za-z0-9] only.
My problem is that I often see characters with modifiers like
00E0 à LATIN SMALL LETTER A WITH GRAVE
00E1 á LATIN SMALL LETTER A WITH ACUTE
00E2 â LATIN SMALL LETTER A WITH CIRCUMFLEX
00E3 ã LATIN SMALL LETTER A WITH TILDE
00E4 ä LATIN SMALL LETTER A WITH DIAERESIS
...
My questions:
is it acceptable, from the perspective of a western language, to replace
those characters with a character without modifier;
is there a way to do this in xslt;
any better option?
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--