Re: possible regexp feature for 5.6: "ignore diacritics"


Jarkko Hietaniemi wrote:

The notation is [=c=], where c is a character (the context equivalence
classes have been previously mentioned has been the POSIX regexp
extensions in general, such as the recently implemented [:class:]
extension).


[snip]

In addition to the POSIX 1003.2 notation,[=c=], I think we could allow
for a new regexp flag to turn the "ignore diacritics", just like we
have "ignore case".  Maybe /d?  (/e would have been be nice but s///e
pre\xEBmpted us.)

I am not emotionally *that* deeply attached to the feature, mostly
because I'm really low on tuits, and will be for some time.  But I
know it's a useful concept, and have a fair idea of how it could be
done, and I wanted the idea to be thrown to the table.


Perhaps I can save you some tuits by pointing out that I don't even
see Perl having a need for the suggested //d flag, at least in simple 
cases.  Is it not the case already that if you wanted to match only a 
non diacritic e in a text that may contain \xEB then you just match that 
with /e/ or /\145/ not a pattern with /\xEB/ and certainly not the 
rather verbose /[=e=]/d (or somesuch).  
Hand coded diacritic variant "classes" can still be individually 
enclosed in /[]/ for the case where selective diacritic matching needs 
to be done (e.g. /[e\xEB]/).
In more complex patterns perhaps (?d:[=c=]) would be useful hence a /d 
flag might need to be implemented, eventually.  But I think the utility 
of [=c=] is pretty high even without a //d like "ignore diacritics" flag.

Peter Prymmer

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: possible regexp feature for 5.6: "ignore diacritics", Mark Leisher

Next by Date:

Re: possible regexp feature for 5.6: "ignore diacritics", Jarkko Hietaniemi

Previous by Thread:

Re: possible regexp feature for 5.6: "ignore diacritics", Tom Christiansen

Next by Thread:

Re: possible regexp feature for 5.6: "ignore diacritics", Mark Leisher

Indexes:

[Date] [Thread] [Top] [All Lists]