On Thu, Aug 03, 2000 at 02:49:11AM -0400, Owen Taylor wrote:
The output of -Dr makes it pretty clear what is going on:
Compiling REx `^\C\C(c)'
size 10 first at 2
rarest char c at 0
1: BOL(2)
2: SANY(3)
3: SANY(4)
4: OPEN1(6)
6: EXACT <c>(8)
8: CLOSE1(10)
10: END(0)
anchored `c' at 2 (checking anchored) anchored(BOL) minlen 3
[...]
Guessing start of match, REx `^\C\C(c)' against `Ã?cole'...
String not equal...
Match rejected by optimizer
For regexes compiled with 'use utf8' the anchor position
is in chars, not bytes, and the re optimizer (study_chunk)
things that \C counts as one char.
Fixing this looks decidedly unfun.
I now submitted a perlbug on this so that this bug (which
unfortunately still seems to be there) won't be forgotten.
Regards,
Owen
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen