xsl-list
[Top] [All Lists]

Re: [xsl] re: Generate identifier

2010-01-07 14:17:59
On Thu, 2010-01-07 at 05:27 -0800, Vladimir Nesterovsky wrote:

Is there a way to decompose characters like:
æ 'LATIN SMALL LETTER AE' (U+00E6)

into a separate letters?
Are there many such characters derived from Latin (I'll be calling
replace() if it's only one or two)?

The primary ones are OE and AE (and oe and ae) and I usually
special-case them, as you can turn them into either the two letters
or just an e, depending on whether you favour "mediaeval" or "medieval",
"foetus" or "fetus" and so on.

There are quite a few others, though, e.g. IJ Œ Ҥ Ҵ Ӕ(cyrillic) և
(armenian) װ (werbeH), ۖ   (Arabic), ff fi fl ffi st 

A search for "ligature" in the Unicode database - or, e.g. in
Linux/Gnome, the character map utility, gucharmap - will find them.
For my purposes (e.g. making filenames and URIs from dictionary
headwords) I turn sequences of one or more non-letters into "-",
after handling accents and ligatures.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>