Jean-Michel Hiver <jhiver(_at_)mkdoc(_dot_)com> writes:
How can I calculate the MD5 message digest of a Unicode string in Perl
5.8? The MD5 hash algorithm naturally expects a sequence of bytes as its
input, and I have a string with a sequence of characters. I tried
$ perl -e 'use Digest::MD5 qw(md5_hex); print md5_hex("\x{20ac}");'
Wide character in subroutine entry at -e line 1.
I'd do something like that:
use Encode;
use Digest::MD5 qw(md5_hex);
sub md5_hex
{
my $string = shift;
Encode::_utf8_off ($string);
return md5_hex ($string);
}
I would argue that it is much better to write it as:
md5_hex(Encode::encode_utf8($string))
Playing with _utf8_{on,off} is ugly for good reason and will break if
the internal representation change. Calling encode_utf8() should be
almost as efficient and is future-proof.
Regards,
Gisle