perl-unicode

Re: Use of UTF-8 under Perl and Unix

1999-11-04 07:00:38
Larry Wall wrote on 1999-11-04 00:44 UTC:
Anyway, we're currently thinking that a "use utf8" declaration will
tell Perl to start expecting UTF-8.  It's also possible we could
automatically switch to UTF-8 processing if we see UTF-8 sequences, but
that's more problematic, and we haven't thought it through yet.

Another idea worth thinking through might also be to deliberately use
non-ASCII characters in some parts of the Perl syntax. This would then
allow the Perl parser to automatically decide what encoding string
constants are in. For instance, I could think of using «string» or even
the

  U+201C  LEFT DOUBLE QUOTATION MARK
  U+201D  RIGHT DOUBLE QUOTATION MARK

as very neat quotation marks. If you use «string», this is also
available in most of the ISO 8859-* sets (plus CP437, etc.), and the
Perl code would survive characetr set conversion. If you just write "use
utf-8", then some automatic character set converter will not remove this
declaration line and replace it by "use 8-bit". However, if you used
something like «» as an indicator for the non-ASCII convention, then the
indicator will be appropriately converted as well automatically. The
Perl parser would have to know the byte sequences for «» in the most
widely used encodings and switch to the appropriate decoder on
encountering the first one.

Just an idea to think about. « and » are mapped onto AltGr-Z and AltGr-X
on many XFree86 keyboards, and users can easily remap keyboards however
they like it. (It's even useful if we motivate people to learn how to
enter non-ASCII characters conveniently via their keyboard.)

It does not necessarily have to be the string terminators. It could also
be some piece of Latin-1 artwork in one of the first lines, where you
would otherwise write "use utf-8":

#!/local/bin/perl
«»;  # means: this file contains non-ASCII characters and signals the encoding
while (<>) {
  ...
}

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>