Peter Prymmer writes:
He he. While your on a roll have you seen this? :)
http://www.unicode.org/unicode/reports/tr16/
Gaaaargh. ("EBCDIC-Friendly Unicode (or UCS) Transformation Format")
There's a lot of interesting stuff in www.unicode.org:
http://www.unicode.org/unicode/reports/tr18/, "Unicode Regular
Expression Guidelines". Some of the things we have, some we don't.
Maybe the most notable idea is *substraction* of categories (these
used to be called character classes). The syntax is neat, too, but I
have doubts about backward compatibility. Basically, '^' toggles the
"polarity" of a category: [A^B^C] = category A - category B + category C.
Here they also talk about "word characters" and equivalence classes ([=c=]
in POSIX), and about collation characters ([.c.] in POSIX).
http://www.unicode.org/unicode/reports/tr10/index.html, "UNICODE
COLLATION ALGORITHM". How to cmp.
http://www.unicode.org/unicode/reports/tr21/, "Case Mappings"
Summary: there's still a lot of work before Perl regexen are fully
Unicode-aware.
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen