perl-unicode

Re: UTF-8 encoding & decoding

2016-05-12 09:48:30
On Friday 06 May 2016 09:24:01 Karl Williamson wrote:
On 05/05/2016 08:37 AM, Pali Rohár wrote:
Hi!

I though that I understand UTF-8 encoding/decoding done in perl until I
looked into source code of Encode package... (exactly sub encode_utf8)

Before... I only read description of Encode package (not source code):
https://metacpan.org/pod/Encode#UTF-8-vs.-utf8-vs.-UTF8

I tried to find some more information (ideally those which answer my
question) but without success. Can you help me? My questions are:

1. What is difference between those two calls?

 utf8::encode($str);

and

 $str = Encode::encode('utf8', $str);

2. What is difference between those?

 utf8::decode($str);
 $str = Encode::decode_utf8($str);

Each pair of functions is supposed to do essentially the same thing. I have
not studied them to know what subtle differences there may be.

If both functions should do same thing, why we have duplicity? And which
one is preferred to use?

3. Where is implementation of utf8::encode/decode functions? It is not
in utf8.pm, nor in utf8_heavy.pl and also not in unicore/Heavy.pl. And
what those functions doing?

The implementation is in universal.c.  But these are just wrappers for
sv_utf8_encode and sv_utf8_decode, which are implemented in sv.c.  Their
documentation is in perlapi.  It should match the documentation of
utf8::decode and utf8::encode, whose documentation is in utf8.pm.  (I myself
have a hard time mapping the names chosen for these operations with what
they actually do)

Ok, thank you!

-- 
Pali Rohár
pali(_dot_)rohar(_at_)gmail(_dot_)com

<Prev in Thread] Current Thread [Next in Thread>