Don't use the \C escape in regexes

"Don't use the \C escape in regexes" - taken from Juerd's Unicode Advice page:

  http://juerd.nl/site.plp/perluniadvice

Why not?

------ perldoc perlre:
\C  Match a single C char (octet) even under Unicode.
    NOTE: breaks up characters into their UTF-8 bytes,
    so you may end up with malformed pieces of UTF-8.
    Unsupported in lookbehind.

------ URI::Escape
sub escape_char {
    return join '', @URI::Escape::escapes{$_[0] =~ /(\C)/g};
}

The regular expression is used to disassemble an incoming text string into 
individual bytes (and then use the resulting list in a hash slice). It is a 
legitimate use case, and the means seems to do the job. What's the problem with 
the \C escape?

-- 
Michael.Ludwig (#) XING.com

<Prev in Thread]

Current Thread

[Next in Thread>

Next by Date:

Re: Don't use the \C escape in regexes - Why not?, Michael Ludwig

Next by Thread:

Re: Don't use the \C escape in regexes - Why not?, Gisle Aas

Indexes:

[Date] [Thread] [Top] [All Lists]