perl-unicode

Re: use encoding 'utf8' and \x{00e4} notation

2010-02-03 04:45:58
Am 03.02.2010 um 08:55 schrieb Aristotle Pagaltzis:

* Michael Ludwig <michael(_dot_)ludwig(_at_)xing(_dot_)com> [2010-02-02 
17:35]:
 use encoding 'utf8';

The `encoding` pragma is broken. Do not use it.

You want

   use open ':encoding(UTF-8)', ':std';

Thanks, Aristotle - that works correctly!

I'm also including a pointer to another posting of yours,
which goes into more detail about how it is broken, and
which confirms and explains my observation.

# Why do my Perl tests fail with `use encoding ‘utf8’`?
http://stackoverflow.com/questions/492838/

I think the manpage should say that \x escapes do not work
for codepoints from x80 through xFF when "use encoding 'utf8'.
It looks like they're replaced by codepoint U+FFFD, which is
the Unicode REPLACEMENT CHARACTER.

http://www.fileformat.info/info/unicode/char/FFFD/index.htm

The manpage says something about that character range (under
CAVEATS > DO NOT MIX MULTIPLE ENCODINGS), but I don't quite
understand what it says.

-- 
Michael.Ludwig (#) XING.com

<Prev in Thread] Current Thread [Next in Thread>