At 02:24 PM 10/18/2007, John Delacour wrote:
Juerd Waalboer wrote:
E R skribis 2007-10-18 9:50 (-0500):
I'm preparing a presentation about Perl and Unicode support, and I'd
like to give a name for characters with ordinals above 255. Is there a
good name for that class?
They are "characters outside the latin-1 range".
Latin-1 has nothing to do with it. There are countless legacy character
sets that use the code points from 32 to 255, and besides, what
maquerades as Latin-1 in various environments rarely is strict iso-8859-1
How about "extended characters"???
Bad name, because it would suggest an actual barrier, which in unicode
isn't there.
Bad name also because the legacy character sets are often referred to as
extensions to ASCII up to 255 or below.
Above that they are multi-byte characters, but that doesn't mean
they're necessarily Unicode, since the CJK legacy character sets are
also multi-byte.
Looking in an older version of the Unicode standard (v3.0)...
7.1 Latin
Unicode follows ISO 8859-1 in the layout of Latin letters up to U+00FF.
U+0041 - U+007A Basic Latin
U+00C0 - U+00FF Latin-1 Supplement
U+0100 - U+017F Latin Extended A
: : : : :
2.4 Unicode Allocation
Allocation Areas
For convenience, the Unicode Standard codespace id divided into
several areas, which are then subdivided into character
blocks: General Scripts Area, Symbols Area, ....
The allocation of characters into areas reflects the evolution of the
Unicode Standard and is not intended to define the usage ....
Codespace Assignment for Graphic Characters
The predominant characteristics of ....
: : :
* The first 256 codes follow precisely the arrangement of ISO/IEC
8859-1 (Latin 1), of which 7-bit ASCII (ISO/IEC 646 IRV) accounts for
the first 128 code positions)
: : :
Seems to me the best answer so far is "beyond Latin-1, into the world"