I just happened to notice that the perlre man page describes the
POSIX "[:punct:]" character class as being equivalent to the unicode
"\p{IsPunct}" character class.
I haven't tried to track down the respective standards documents for
POSIX and Unicode to see whether these classes are _supposed_ to be
equivalent over the printable ASCII character set, but when I test them
in Perl 5.8.1, they are _not_ equivalent, as the following snippet will
demonstrate:
for $x ( 0x20 .. 0x7e ) {
$_ = chr( $x );
$res = ( /[[:punct:]]/ ) ? "matches :punct:" : "is not a :punct:";
$res .= ( /\p{IsPunct}/ ) ? " matches {IsPunct}" : " fails on {IsPunct}";
printf( " 0x%x (%3d.) %s %s\n", $x, $x, $_, $res ) if ( $res =~ /matches/ );
}
The differences involve these nine characters: $ + < = > ^ ` | ~
Except for the back-tick (`), I wouldn't be surprised if POSIX and
Unicode are supposed to differ on these points, so maybe it's just a
matter of fixing the perlre man page. (I'm not sure yet what the
behavior of [:punct:] is supposed to be on non-ASCII punctuation
characters in Unicode -- maybe the man page should clarify this too.)
Dave Graff