Re: CGI and UTF

On January 5, 2003 at 05:42, Jarkko Hietaniemi wrote:

This is Bad Juju (tm). It _guarantees_ script breakage (potentially
silently!) for Unix people doing _anything_ but ASCII text manipulation.


I repeat: I don't think you can do "more than ASCII" by hanging tooth
and nail to the "everything is bytes" credo.


This statement assumes someone is working with characters.  It is
common for many to use regexs and other operators (substr, index,
et. al.) on binary data directly.

I repeat: all your filehandles are still 'binary' unless you either
explicitly (binmode) or implicitly (locale) command them not be.
If you try to push Unicode (data marked as UTF-8, such as characters
beyond 255) on such a filehandle, you'll get 'Wide character' warning.
If you do not like the locale implicit switching, reset your locale
to something not /utf-?8/i in it before running the script.


I think this reasoning is flawed since it assumes the author of
the script has complete control over the environment.  For example,
the script can be used by others in environments the author does not
control.  Therefore, older programs can quietly break, or behave
different.

According the perllocale manpage, locale should have no effect
unless the 'use locale' pragma is specified.  It appears from
Benjamin's script that he is not using the pragma, so even if the
environment has a utf-8 locale, the script should be unaffected.

--ewh

<Prev in Thread]	Current Thread	[Next in Thread>
Re: CGI and UTF, Jarkko Hietaniemi Re: CGI and UTF, Benjamin Franz Re: CGI and UTF, Jarkko Hietaniemi Re: CGI and UTF, Jarkko Hietaniemi Re: CGI and UTF, Benjamin Franz Re: CGI and UTF, Peter Haworth Re: CGI and UTF, Jarkko Hietaniemi Re: CGI and UTF, Jarkko Hietaniemi Re: CGI and UTF, Earl Hood <= Re: CGI and UTF, Jarkko Hietaniemi