Re: possible regexp feature for 5.6: "ignore diacritics"

perl-unicode

[Top] [All Lists]

Re: possible regexp feature for 5.6: "ignore diacritics"

1999-10-18 02:25:21

from [Ilya Zakharevich]

[Permanent Link]

Jarkko Hietaniemi writes:

This concept is handy when matching for diacritic-laden variants of
non-ASCII encodings.  For example finding "bär" when matching with
"bar" would often be most convenient.  The concept is not limited for
Western alphabets, it works also on Cyrillic/Greek/Hebrew/Arabic/...
alphabets.


What do you mean by this?  Is [=a=] going to stand for \N{cyrillic:a}
and \N{arabic:alef}?  Of do you mean [=\N{arabic:alef}=] to stand for
\N{ARABIC LETTER ALEF WITH HAMZA ABOVE}?

Note that the latter concept is not that good for cyrillic.  Well,
there are *some* languages where there are tiny changes in chars, but
at least for Russian it is very hard to justify.  

Say, \N{cyrillic:i} is a vowel, but \N{cyrillic:short i} is a
semiconsonant (though in writing one looks as another with a
"checkish" mark).  There is no direct relationship between them.

Ilya

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
Re: possible regexp feature for 5.6: "ignore diacritics", (continued) Re: possible regexp feature for 5.6: "ignore diacritics", Peter Prymmer Re: possible regexp feature for 5.6: "ignore diacritics", Mark Leisher possible regexp feature for 5.6: "ignore diacritics", Jarkko Hietaniemi Re: possible regexp feature for 5.6: "ignore diacritics", Erik Bertelsen Re: possible regexp feature for 5.6: "ignore diacritics", Jarkko Hietaniemi Re: possible regexp feature for 5.6: "ignore diacritics", Jarkko Hietaniemi Re: possible regexp feature for 5.6: "ignore diacritics", Russ Allbery Re: possible regexp feature for 5.6: "ignore diacritics", Jarkko Hietaniemi Re: possible regexp feature for 5.6: "ignore diacritics", Russ Allbery Re: possible regexp feature for 5.6: "ignore diacritics", Markus Gyger Re: possible regexp feature for 5.6: "ignore diacritics", Ilya Zakharevich <= Re: possible regexp feature for 5.6: "ignore diacritics", Jarkko Hietaniemi Re: possible regexp feature for 5.6: "ignore diacritics", Mark Leisher

Previous by Date:	Re: possible regexp feature for 5.6: "ignore diacritics", Russ Allbery
Next by Date:	Re: possible regexp feature for 5.6: "ignore diacritics", Mark Leisher
Previous by Thread:	Re: possible regexp feature for 5.6: "ignore diacritics", Markus Gyger
Next by Thread:	Re: possible regexp feature for 5.6: "ignore diacritics", Jarkko Hietaniemi
Indexes:	[Date] [Thread] [Top] [All Lists]