Re: [[:foobar:]]


Peter Prymmer writes:

He he.  While your on a roll have you seen this? :)

   http://www.unicode.org/unicode/reports/tr16/


Gaaaargh.  ("EBCDIC-Friendly Unicode (or UCS) Transformation Format")

There's a lot of interesting stuff in www.unicode.org:

http://www.unicode.org/unicode/reports/tr18/, "Unicode Regular
Expression Guidelines".  Some of the things we have, some we don't.
Maybe the most notable idea is *substraction* of categories (these
used to be called character classes).  The syntax is neat, too, but I
have doubts about backward compatibility.  Basically, '^' toggles the
"polarity" of a category: [A^B^C] = category A - category B + category C.
Here they also talk about "word characters" and equivalence classes ([=c=]
in POSIX), and about collation characters ([.c.] in POSIX).

http://www.unicode.org/unicode/reports/tr10/index.html, "UNICODE
COLLATION ALGORITHM".  How to cmp.

http://www.unicode.org/unicode/reports/tr21/, "Case Mappings"

Summary: there's still a lot of work before Perl regexen are fully
Unicode-aware.

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen