Jarkko Hietaniemi wrote:
The notation is [=c=], where c is a character (the context equivalence
classes have been previously mentioned has been the POSIX regexp
extensions in general, such as the recently implemented [:class:]
extension).
[snip]
In addition to the POSIX 1003.2 notation,[=c=], I think we could allow
for a new regexp flag to turn the "ignore diacritics", just like we
have "ignore case". Maybe /d? (/e would have been be nice but s///e
pre\xEBmpted us.)
I am not emotionally *that* deeply attached to the feature, mostly
because I'm really low on tuits, and will be for some time. But I
know it's a useful concept, and have a fair idea of how it could be
done, and I wanted the idea to be thrown to the table.
Perhaps I can save you some tuits by pointing out that I don't even
see Perl having a need for the suggested //d flag, at least in simple
cases. Is it not the case already that if you wanted to match only a
non diacritic e in a text that may contain \xEB then you just match that
with /e/ or /\145/ not a pattern with /\xEB/ and certainly not the
rather verbose /[=e=]/d (or somesuch).
Hand coded diacritic variant "classes" can still be individually
enclosed in /[]/ for the case where selective diacritic matching needs
to be done (e.g. /[e\xEB]/).
In more complex patterns perhaps (?d:[=c=]) would be useful hence a /d
flag might need to be implemented, eventually. But I think the utility
of [=c=] is pretty high even without a //d like "ignore diacritics" flag.
Peter Prymmer