perl-unicode

Re: UTF-8 encoding & decoding

2016-05-06 10:24:37
On 05/05/2016 08:37 AM, Pali Rohár wrote:
Hi!

I though that I understand UTF-8 encoding/decoding done in perl until I
looked into source code of Encode package... (exactly sub encode_utf8)

Before... I only read description of Encode package (not source code):
https://metacpan.org/pod/Encode#UTF-8-vs.-utf8-vs.-UTF8

I tried to find some more information (ideally those which answer my
question) but without success. Can you help me? My questions are:

1. What is difference between those two calls?

  utf8::encode($str);

and

  $str = Encode::encode('utf8', $str);

2. What is difference between those?

  utf8::decode($str);
  $str = Encode::decode_utf8($str);

Each pair of functions is supposed to do essentially the same thing. I have not studied them to know what subtle differences there may be.

3. Where is implementation of utf8::encode/decode functions? It is not
in utf8.pm, nor in utf8_heavy.pl and also not in unicore/Heavy.pl. And
what those functions doing?

The implementation is in universal.c. But these are just wrappers for sv_utf8_encode and sv_utf8_decode, which are implemented in sv.c. Their documentation is in perlapi. It should match the documentation of utf8::decode and utf8::encode, whose documentation is in utf8.pm. (I myself have a hard time mapping the names chosen for these operations with what they actually do)



<Prev in Thread] Current Thread [Next in Thread>