perl-unicode

Re: IRI support in URI and URI::Escape modules

2005-01-31 03:04:54
On Jan 31, 2005, at 18:19, Martin Duerst wrote:
I started with some very simple (I thought) tests, but got
completely confused very quickly. Here is the short program
that I was using:

>>>> test.pl
use utf8;
use URI;
use URI::Escape;

print (uri_escape("\xFD")
[snip]

With this, on perl, v5.6.1 built for MSWin32-x86-multi-thread
(with 1 registered patch, see perl -V for more detail), I get

>>>>
%FD
%C3%BD

[snip]
However, on perl, v5.8.4 built for i386-linux-thread-multi,
I get:

>>>>
%FD
[snip]
Nothing seems to work anymore, although (or because?) 5.8
has better Unicode support.

The (easiest|new canonical) way to go is to use uri_escape_utf8() instead of uri_escape(). Note that as of version 3.28 uri_escape_utf8() is NOT AUTOMATICALLY loaded.

% perl -MURI::Escape -le 'print uri_escape("\xFD")'
%FD
% perl -MURI::Escape=uri_escape_utf8 -le 'print uri_escape_utf8("\xFD")'
%C3%BD

perldoc URI::Escape
       uri_escape_utf8( $string )
       uri_escape_utf8( $string, $unsafe )
Works like uri_escape(), but will encode chars as UTF-8 before escaping them. This makes this function able do deal with charac- ters with code above 255 in $string. Note that chars in the 128 .. 255 range will be escaped differently by this function compared to what uri_escape() would. For chars in the 0 .. 127 range there is
           no difference.

           The call:

               $uri = uri_escape_utf8($string);

           will be the same as:

               use Encode qw(encode);
               $uri = uri_escape(encode("UTF-8", $string));

but will even work for perl-5.6 for chars in the 128 .. 255 range.

Dan the Encode Maintainer


<Prev in Thread] Current Thread [Next in Thread>