On Jan 31, 2005, at 18:19, Martin Duerst wrote:
I started with some very simple (I thought) tests, but got
completely confused very quickly. Here is the short program
that I was using:
>>>> test.pl
use utf8;
use URI;
use URI::Escape;
print (uri_escape("\xFD")
[snip]
With this, on perl, v5.6.1 built for MSWin32-x86-multi-thread
(with 1 registered patch, see perl -V for more detail), I get
>>>>
%FD
%C3%BD
[snip]
However, on perl, v5.8.4 built for i386-linux-thread-multi,
I get:
>>>>
%FD
[snip]
Nothing seems to work anymore, although (or because?) 5.8
has better Unicode support.
The (easiest|new canonical) way to go is to use uri_escape_utf8()
instead of uri_escape(). Note that as of version 3.28
uri_escape_utf8() is NOT AUTOMATICALLY loaded.
% perl -MURI::Escape -le 'print uri_escape("\xFD")'
%FD
% perl -MURI::Escape=uri_escape_utf8 -le 'print uri_escape_utf8("\xFD")'
%C3%BD
perldoc URI::Escape
uri_escape_utf8( $string )
uri_escape_utf8( $string, $unsafe )
Works like uri_escape(), but will encode chars as UTF-8
before
escaping them. This makes this function able do deal with
charac-
ters with code above 255 in $string. Note that chars in
the 128 ..
255 range will be escaped differently by this function
compared to
what uri_escape() would. For chars in the 0 .. 127 range
there is
no difference.
The call:
$uri = uri_escape_utf8($string);
will be the same as:
use Encode qw(encode);
$uri = uri_escape(encode("UTF-8", $string));
but will even work for perl-5.6 for chars in the 128 .. 255
range.
Dan the Encode Maintainer