perl-unicode

Re: should a non-breaking space character be treated as whitespace in perl source?

2005-10-25 08:56:28
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Nicholas Clark wrote:
| On Wed, Oct 05, 2005 at 05:20:34PM -0400, 
khadrin(_at_)columbus(_dot_)rr(_dot_)com wrote:
|
|>Should a non-breaking space character be treated as whitespace in
|>perl source code?  It doesn't appear to be:
|
|
| As far as I know code points outside the range 0-127 are invalid,
except as
| quotes for q, qq, etc, by default. Under use utf8; Unicode word characters
| can also be used in identifiers.

The classification of characters (locale category LC_CTYPE) is locale
dependent and therefore unfortunately system dependent.  In most (if not
all) locales defined in current GNU libc versions, the no-break space is
~ classified as punctuation, graphical, and printable.  See
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/localedata/locales/i18n?rev=1.23&content-type=text/x-cvsweb-markup&cvsroot=glibc,
and search for "<U00A0>".

| I doubt that this will change in perl 5, because the parser is written
in C,
| and so it would be very hard work to replace it with something that
was fully
| Unicode aware.

I don't see that this has something to do with C but with the locale
definitions used in the system libc.  But in fact, the whole purpose of
the no-break space is to provide a blank character that is _not_
interpreted as a space.

Ciao,
Guido
- --
Imperia AG, Development
Leyboldstr. 10 - D-50354 Hürth - http://www.imperia.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFDXlWUOo0HNPWNDz0RAqo/AKCPbQzVnSEC2FNY3bQWafaVpqcbRwCgwfmv
jG5jX81CcdJ1KFL9HzhS81w=
=1RuS
-----END PGP SIGNATURE-----

<Prev in Thread] Current Thread [Next in Thread>