perl-unicode

Re: possible regexp feature for 5.6: "ignore diacritics"

1999-10-18 02:25:21
Jarkko Hietaniemi writes:
This concept is handy when matching for diacritic-laden variants of
non-ASCII encodings.  For example finding "bär" when matching with
"bar" would often be most convenient.  The concept is not limited for
Western alphabets, it works also on Cyrillic/Greek/Hebrew/Arabic/...
alphabets.

What do you mean by this?  Is [=a=] going to stand for \N{cyrillic:a}
and \N{arabic:alef}?  Of do you mean [=\N{arabic:alef}=] to stand for
\N{ARABIC LETTER ALEF WITH HAMZA ABOVE}?

Note that the latter concept is not that good for cyrillic.  Well,
there are *some* languages where there are tiny changes in chars, but
at least for Russian it is very hard to justify.  

Say, \N{cyrillic:i} is a vowel, but \N{cyrillic:short i} is a
semiconsonant (though in writing one looks as another with a
"checkish" mark).  There is no direct relationship between them.

Ilya