perl-unicode

Re: /\w/ match with 'use locale' misses letters in utf8 locale

2008-07-11 02:16:33
В Птн, 11/07/2008 в 09:00 +0200, Juerd Waalboer пишет:
Peter Volkov skribis 2008-07-11 10:10 (+0400):
The problem is that in Linux (Gentoo and Debian I've tried) /\w/ does
not match Russian letter while I use locale and LC_COLLATE is set to
ru_RU.UTF-8.

\w should match Cyrillic letters even without "use locale". You might be
running into an annoying bug which makes \w lose its unicode support
depending on the *internal* state of a value.

This behavior is reproducible with cp1251 encoding too. So...

Despite the above there's a slightly more important issue here. You're
opening a text file but you don't specify the character encoding.

seems to be the answer I was looking for. But this makes me wonder why
use locale exists then? I thought that it should take "default" or not
specified encoding from environment... And really questionable why in
FreeBSD everything works.

In any case thank you Juerd for very fast answer.

-- 
Peter.

<Prev in Thread] Current Thread [Next in Thread>