perl-unicode

Re: \p{IsBogus} vs. exception

2006-03-06 12:20:02
Dan Kogai <dankogai(_at_)dan(_dot_)co(_dot_)jp> wrote:
:Porters,
:
:One of my blog friend came across with this.  When you feed perl a  
:bogus unicode property, it raises an exception like 'Can't find  
:Unicode property definition "Bogus" at uniprop.pl line 17.
:'.  But this does not always work.  Here is a sample code.
:
:#!/usr/local/bin/perl
:my $count = 0;
:sub p { print $count++, " : ", (map {s/\n//g; $_ } @_), "\n" } # 4  
:convenience;
:p eval{ ""    =~ /\p{IsBogus}/      }, $@; # no exception
:p eval{ "str" =~ /\p{IsBogus}/      }, $@; # exception
:p eval{ "str" =~ /^\p{IsBogus}/     }, $@; # exception
:p eval{ "str" =~ /\p{IsBogus}$/     }, $@; # exception
:p eval{ "str" =~ /\A\p{IsBogus}/    }, $@; # exception
:p eval{ "str" =~ /\p{IsBogus}\Z/    }, $@; # exception
:p eval{ "str" =~ /\b\p{IsBogus}/    }, $@; # exception
:p eval{ "str" =~ /\w\p{IsBogus}/    }, $@; # exception
:p eval{ "str" =~ /A\p{IsBogus}/     }, $@; # no exception
:p eval{ "str" =~ /[A-Z]\p{IsBogus}/ }, $@; # no exception
:__END__
:
:Seems like any flavor of perl 5.8 and above behave that way.

This all looks perfectly consistent to me: the expensive work of
looking up the property is not done until matching actually gets
to that point.

In the case of the three examples, the optimiser throws us out
without matching at all; a similar test:
  p eval{ "str" =~ /s(?=[A-Z])\p{IsBogus}/ }, $@;
shows that behaviour is the same if we hit the runtime matcher,
but never reach the bogus property.

I would not call this a bug (not sure if you were suggesting that it
is) - if you need to check whether a property is bogus, your example
has_unicode_property is fine. But it would not be unreasonable for
utf8.pm (or something) to provide a function that delivers the same
information.

Hugo

<Prev in Thread] Current Thread [Next in Thread>