perl-unicode

Re: 5.8.1 perlre man page: [:punct:] vs. \p{IsPunct}

2003-11-02 10:30:05
I just happened to notice that the perlre man page describes the 
POSIX "[:punct:]" character class as being equivalent to the unicode 
"\p{IsPunct}" character class.

I haven't tried to track down the respective standards documents for
POSIX and Unicode to see whether these classes are _supposed_ to be
equivalent over the printable ASCII character set, but when I test them

AFAIK there are currently no existing standards defining those
equivalences.  There has been some discussion about that in Unicode
consortium mailing lists, but in fact there are some doubts about the
wisdom of stating anything about such equivalences (because the C
standards where the :foo: originate have frankly no clue about the
more complex property structure of Unicode).

The closest upcoming standard is the proposed update to the TR18:
http://www.unicode.org/reports/tr18/tr18-8.html, see Annex C.

If you say :punct: on a non-Unicode data, you are doing _operating_
_system_ _dependent_ AND _locale_ _dependent_ operation.  :punct: and
\p{Punct} are (supposed to be) equivalent with Unicode data.

in Perl 5.8.1, they are _not_ equivalent, as the following snippet will
demonstrate:

-- 
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this 
special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

<Prev in Thread] Current Thread [Next in Thread>