perl-unicode

Re: MD5 digest of UTF-8 string in Perl 5.8

2002-10-23 08:30:09
Jean-Michel Hiver <jhiver(_at_)mkdoc(_dot_)com> writes:

How can I calculate the MD5 message digest of a Unicode string in Perl
5.8? The MD5 hash algorithm naturally expects a sequence of bytes as its
input, and I have a string with a sequence of characters. I tried

  $ perl -e 'use Digest::MD5 qw(md5_hex); print md5_hex("\x{20ac}");'
  Wide character in subroutine entry at -e line 1.

I'd do something like that:

  use Encode;
  use Digest::MD5 qw(md5_hex);

  sub md5_hex
  {
    my $string = shift;
    Encode::_utf8_off ($string);
    return md5_hex ($string);
  }

I would argue that it is much better to write it as:

    md5_hex(Encode::encode_utf8($string))

Playing with _utf8_{on,off} is ugly for good reason and will break if
the internal representation change.  Calling encode_utf8() should be
almost as efficient and is future-proof.

Regards,
Gisle

<Prev in Thread] Current Thread [Next in Thread>