perl-unicode

Re: CGI::Util unescape() after escape() loses utf8 flag

2005-09-27 20:41:24
David Graff wrote:
Looking at the source for CGI::Util, it appears that disabling the 
utf8 flag is intended as a feature, not a bug:

I think that is just due to the low level nature of escape(), because
escape() doesn't care about characters, just byte sequences.

I would guess that
there is no way for "unescape" to "know" when a given input string 
shouldbe decoded as utf8 data.  Only the calling app can know that, 

I'd pretty much agree.  So is there some way that I have missed to
tell CGI.pm how the query string should be decoded?  Or some obvious
way to extend CGI.pm that I haven't thought of?

I'd like a way to tell CGI.pm how to interpret characters once it has
reconstructed the byte sequence.  Then I'd like $query->param('name')
to always give me a scalar with utf8 characters and the utf8 flag set.
A string in a scalar which doesn't have the utf8 flag set isn't that
useful _in perl_, right?  (That is, unless it's ascii only.)  Perl
wouldn't know how to interpret characters.  The application might, but
to Perl it's unstructured binary data.

-Stephen