perl-unicode

CGI::Util unescape() after escape() loses utf8 flag

2005-09-27 15:55:27
CGI::Util has a couple functions escape() and unescape() which
url encode/decode strings.  Unfortunately I lose the utf8 flag
on my scalar when I encode then decode using those functions
(see below).  Should unescape() be setting the utf8 flag? Or
is there no way for unescape() to know that it should set the
utf8 flag?

Thanks,
Stephen

$ perl -MCGI::Util -MEncode -MDevel::Peek -d -e 1
  DB<1> $foo =
CGI::Util::unescape(CGI::Util::escape(Encode::decode_utf8("asdf bsdf
77\xc2\xb0")))

  DB<2> Dump $foo
SV = PVMG(0x8d5848) at 0x8b89dc
  REFCNT = 1
  FLAGS = (POK,pPOK)
  IV = 0
  NV = 0
  PV = 0x8db570 "asdf bsdf 77\302\260"\0
  CUR = 14
  LEN = 15

Note that UTF8 is missing from FLAGS above.

For those unfamiliar with Devel::Dump, this is what it gives for a perl
scalar with utf8 characters and the utf8 flag set:

  DB<3> $foo = Encode::decode_utf8("asdf bsdf 77\xc2\xb0")

  DB<4> Dump $foo
SV = PVMG(0x8d5848) at 0x8b89dc
  REFCNT = 1
  FLAGS = (POK,pPOK,UTF8)
  IV = 0
  NV = 0
  PV = 0x8d7c78 "asdf bsdf 77\302\260"\0 [UTF8 "asdf bsdf 77\x{b0}"]
  CUR = 14
  LEN = 15
  MAGIC = 0x8d2368
    MG_VIRTUAL = &PL_vtbl_utf8
    MG_TYPE = PERL_MAGIC_utf8(w)
    MG_LEN = 13