"Don't use the \C escape in regexes" - taken from Juerd's Unicode Advice page:
http://juerd.nl/site.plp/perluniadvice
Why not?
------ perldoc perlre:
\C Match a single C char (octet) even under Unicode.
NOTE: breaks up characters into their UTF-8 bytes,
so you may end up with malformed pieces of UTF-8.
Unsupported in lookbehind.
------ URI::Escape
sub escape_char {
return join '', @URI::Escape::escapes{$_[0] =~ /(\C)/g};
}
The regular expression is used to disassemble an incoming text string into
individual bytes (and then use the resulting list in a hash slice). It is a
legitimate use case, and the means seems to do the job. What's the problem with
the \C escape?
--
Michael.Ludwig (#) XING.com