perl-unicode

Re: good name for characters matching [^\0-\377]?

2007-10-18 15:33:24
At 02:24 PM 10/18/2007, John Delacour wrote:
Juerd Waalboer wrote:
E R skribis 2007-10-18  9:50 (-0500):
I'm preparing a presentation about Perl and Unicode support, and I'd
like to give a name for characters with ordinals above 255. Is there a
good name for that class?
They are "characters outside the latin-1 range".

Latin-1 has nothing to do with it.  There are countless legacy character
sets that use the code points from 32 to 255, and besides, what
maquerades as Latin-1 in various environments rarely is strict iso-8859-1

How about "extended characters"???
Bad name, because it would suggest an actual barrier, which in unicode
isn't there.

Bad name also because the legacy character sets are often referred to as
extensions to ASCII up to 255 or below.

Above that they are multi-byte characters, but that doesn't mean they're necessarily Unicode, since the CJK legacy character sets are also multi-byte.

Looking in an older version of the Unicode standard (v3.0)...

7.1 Latin
Unicode follows ISO 8859-1 in the layout of Latin letters up to U+00FF.
U+0041 - U+007A    Basic Latin
U+00C0 - U+00FF   Latin-1 Supplement
U+0100 - U+017F    Latin Extended A
 :   :   :   :   :

2.4 Unicode Allocation
Allocation Areas
For convenience, the Unicode Standard codespace id divided into several areas, which are then subdivided into character blocks: General Scripts Area, Symbols Area, ....

The allocation of characters into areas reflects the evolution of the Unicode Standard and is not intended to define the usage ....

Codespace Assignment for Graphic Characters
The predominant characteristics of ....
   :   :   :
* The first 256 codes follow precisely the arrangement of ISO/IEC 8859-1 (Latin 1), of which 7-bit ASCII (ISO/IEC 646 IRV) accounts for the first 128 code positions)
   :   :   :


Seems to me the best answer so far is "beyond Latin-1, into the world"