I've encountered some odd behavior, which I found a workaround for, but
don't know why it works. If somebody could give me a hint, I would
appreciate it...
So this is for a BBS system. I'm parsing an incoming BBS posting title,
which is in euc-jp from Apache::Request. I do the following processing
to merge all the preceding "Re: Re: Re: ...."s into "Re(n):":
sub fixtitle
{
my $title = shift;
my $res = ($title =~ s/(?:^|\G)(Re:\s+)//g);
$title =~ s/(?:^|\G)(?:Re\((\d+)\):\s+)/$res += $1; ''/eg;
if($res > 1) {
$title = sprintf( "Re(%d): %s", $res, $title );
} elsif($res == 1) {
$title = "Re: $title";
}
return $title;
}
This routine, when put into production mod_perl environment, started
causing the subsequent regular expressions to act oddly:
my $fixed = fixtitle($original_title);
if( $fixed =~ /[<>]/ ) {
die "smells like markup (be safe and die)";
}
Under some circumstances, the regular expression started to ALWAYS
match, regardless of the presense of "<>".
After some hacking, I found that for some odd reason, if I put
Encode::_utf8_off($title);
return $title;
in fixtitle(), this problem goes away and the subsequent match works as
expected.
So I got the workaround. But, why does it work like this? I'm confused.
--d