perl-unicode

Re: Pattern matching with Unicode (5.6.1)

2002-08-15 08:02:32
On Wed, Aug 14, 2002 at 03:53:45PM -0400, David Gray wrote:
I've (sort of) made it work by doing:
 # strip BOM and trailing nulls and carriage returns
 s/^..// if $. == 1 and s/\0//g;
 s/[\0\r]//g;

Are you working with UTF-16, or Microsoftish UTF8+BOM?  I'm not
aware that 5.6.1 supports either of them.

Anyway, according to 5.6.1's perlunicode.pod:

       Regular Expressions
           The existing regular expression compiler does not pro-
           duce polymorphic opcodes.  This means that the deter-
           mination on whether to match Unicode characters is
           made when the pattern is compiled, based on whether
           the pattern contains Unicode characters, and not when
           the matching happens at run time.  This needs to be
           changed to adaptively match Unicode if the string to
           be matched is Unicode.

So maybe you need to arbitarily insert unicode characters in your
regex, or upgrade to 5.8.

/Autrijus/

Attachment: pgpmC6b0YR8W0.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>