Hello,
I tried to make my Perl5 code unicode compliant after reading a post on
stackoverflow[1].
As suggested in the post:
“always run incoming stuff through NFD and outbound stuff from NFC.”
I got a hard time finding why my Test::More was failing but displaying
exactly the same strings for “got” and “expected”.
I finally check how UTF-8 sources are handled and found that they are in
NFC form, I run the following script:
#+begin_src perl
#!/usr/bin/env perl
use utf8;
use warnings;
use Test::More;
use Unicode::Normalize;
my $unistring = 'C’est une chaîne unicode';
my @forms = ("NFD", "NFC", "NFKD", "NFKC");
for my $form (@forms) {
if ($unistring eq &$form($unistring)) {
print "UTF-8 source is in form '$form'\n";
}
}
#+end_src
and got:
#+begin_src
UTF-8 source is in form 'NFC'
UTF-8 source is in form 'NFKC'
#+end_src
So, the Test::More::is_deeply was trying to compare an input in NFD with
the expected string in NFC.
My code can use Unicode::Collate, but for all the code I did not write I
wonder if there is a way to handle it cleanly.
Or maybe I'm doing something wrong?
Regards.
Footnotes:
[1]
https://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default
--
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
signature.asc
Description: PGP signature