perl-unicode

Re: good name for characters matching [^\0-\377]?

2007-10-18 16:29:20
E R skribis 2007-10-18 16:21 (-0500):
I should have added that in my presentation I am attempting to present
Perl strings from a character set agnostic perspective.

That is silly, because Perl itself is not at all character set agnostic.

It has unicode strings and it has binary strings, but those are your
tools.

So, even though there is a strong bias for Perl to treat character
ordinals > 255 as Unicode code-points

Er, no, all character ordinals, including 0..255, are Unicode
codepoints.

255 is unicode just like 256. There is no actual barrier in between!!

I don't want people to automatically think Unicode when encountering
one of these "non-legacy characters".

If they don't automatically think of Unicode, they won't be using Perl's
functionality in the most efficient and time saving way. I'm hoping this
is not your desired goal.

To be honest, I'm not sure you know enough about Perl's string model to
be giving a presentation about Unicode in Perl. You just learnt very
important aspects, and from the things you write, I'd say you still have
some other important aspects to learn or accept. No offense meant.

I'm just wondering if there is an established term. Perhaps
"extended/large character ordinal"?

The established term for a character ordinal is "code point".

It would help as in the sentence: "If your string contains a ___, Perl
will assume your string represents Unicode code-points."

"If you use your string for text operations, Perl will assume your string
is a Unicode string."

Note that there is a bug in uppercasing/lowercasing, and in some
built-in regular expression character classes, that causes Perl to look
at the internal encoding. This is a leak in the unicode abstraction, and
will probably be fixed with Perl 5.12.

It is very simple (and future proof) to work around this problem by
using the Unicode::Semantics module's up() function, or the built-in
utf8::upgrade().
-- 
Met vriendelijke groet,  Kind regards,  Korajn salutojn,

  Juerd Waalboer:  Perl hacker  <#####(_at_)juerd(_dot_)nl>  
<http://juerd.nl/sig>
  Convolution:     ICT solutions and consultancy 
<sales(_at_)convolution(_dot_)nl>