perl-unicode

How does this matching work?

2003-04-09 11:30:05

I've encountered some odd behavior, which I found a workaround for, but
don't know why it works. If somebody could give me a hint, I would
appreciate it...

So this is for a BBS system. I'm parsing an incoming BBS posting title,
which is in euc-jp from Apache::Request. I do the following processing
to merge all the preceding "Re: Re: Re: ...."s into "Re(n):":

   sub fixtitle
   {
        my $title = shift;
        my $res = ($title =~ s/(?:^|\G)(Re:\s+)//g);
        $title =~ s/(?:^|\G)(?:Re\((\d+)\):\s+)/$res += $1; ''/eg;
        if($res > 1) {
            $title = sprintf( "Re(%d): %s", $res, $title );
        } elsif($res == 1) {
            $title = "Re: $title";
        }
        return $title;
   }

This routine, when put into production mod_perl environment, started
causing the subsequent regular expressions to act oddly:

   my $fixed = fixtitle($original_title);
   if( $fixed =~ /[<>]/ ) {
      die "smells like markup (be safe and die)";
   }

Under some circumstances, the regular expression started to ALWAYS
match, regardless of the presense of "<>".

After some hacking, I found that for some odd reason, if I put

   Encode::_utf8_off($title);
   return $title;

in fixtitle(), this problem goes away and the subsequent match works as
expected.

So I got the workaround. But, why does it work like this? I'm confused.

--d

<Prev in Thread] Current Thread [Next in Thread>
  • How does this matching work?, Daisuke Maki <=