perl-unicode

Re: possible regexp feature for 5.6: "ignore diacritics"

1999-10-18 02:18:07

    Peter> Hand coded diacritic variant "classes" can still be individually
    Peter> enclosed in /[]/ for the case where selective diacritic matching
    Peter> needs to be done (e.g. /[eë]/).

This is fine for small sets of variants, but when you need to specify all of
them, the expression grows quite large.  Searching an on-line version of the
Koran is one example.  Iterating all the possible consonant and vowel
combinations is a lot more work than just ignoring non-spacing characters.

When teaching language, it is often useful to ignore diacritics and locate all
words that have the same underlying pattern.  For instance, learning the
differences between Vietnamese words that differ only in diacritics becomes
easier.
-----------------------------------------------------------------------------
Mark Leisher
Computing Research Lab            The first virtue is to restrain the tongue;
New Mexico State University       he approaches nearest to the gods who knows
Box 30001, Dept. 3CRL             how to be silent, even though he is in the
Las Cruces, NM  88003             right.    -- Cato the Younger (95-46 B.C.E)