perl-unicode

Re: CGI and UTF

2003-01-05 09:30:04
or implicitly (locale) command them not be.

Not fine without a warning. This is 'action at a distance' (this is the
same reason un'local'ized usage of the 'special' variables is nearly

On that we can agree, kind of-- I find the *whole* locale system to be
a Bad Idea (tm) (not just any UTF-8 parts of it).  Locales are *all*
about action-at-a-distance.

always a Bad Idea (tm)). It causes breakage that can be hard to find the
cause of. Perl needs a mandatory warning if the locale changes my
filehandles to text mode and I haven't made some kind of _explicit_
declaration that I want that behavior to happen.

The change is of a bad 'type': An incompatible change in Perl semamtics
without so much as a warning being issued by either the compiler or the
runtime - except to make the code fall over dead many lines away from the
actual breakage. If the string is invalid UTF8, why didn't Perl complain
_when I read it_ instead of dozens of lines away when I tried to use that
string for something else? That is _broken_.

See below.

If you try to push Unicode (data marked as UTF-8, such as characters
beyond 255) on such a filehandle, you'll get 'Wide character' warning.

But it _reads_ binary data through a UTF8 layer silently. No warnings. Try
the code I posted on an actual jpg file with UTF-8 local set in the
environment. The first complaint is when the code falls over dead in the
'jpegsize' sub - many lines of code away from the <fh> read.

I think now I reached your page.  I have to think more about this,
though, not to make the checking at the point of reading for example
unreasonably slow.  And I'll be rather Internet connectivity
challenged in the coming weeks, so please be patient.

-- 
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this 
special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

<Prev in Thread] Current Thread [Next in Thread>