Here is an update on the Unicode/10646 merger that has been posted
to several news groups.
---------
Date: Fri, 18 Oct 91 14:59:06 EDT
From: schein(_at_)TOROLAB5(_dot_)vnet(_dot_)ibm(_dot_)com
Subject: Universal Code Set (UCS) update
The update on the Unicode/10646:
A significant progress was made during the recent ISO work group
(SC2/WG2) meetings in Geneva (August 91) and Paris (October 91). The
interest was very high with 30 people (representing 13 countries)
attending the meetings. A broad consensus was reached on all technical
issues (see details below).
ISO SC2 plenary (Rennes, October 91) unanimously authorized WG2 to issue
a new DIS 10646 in January 1992 for a 4-month vote. It is expected that
the 2nd DIS will be approved and the International Standard (IS 10646)
will be completed by 3Q 92.
2nd DIS 10646 (UCS) contents:
=============================
Architecture:
> 4-byte canonical form (UCS-4)
> 2-byte Base Multilingual Plane on Plane 0 (UCS-2)
- No characters are currently defined on any other planes
- No 'swapping' of the other planes is defined
- Compaction methods 1, 3, and 5 are eliminated
- Single Graphic Character Introducer (SGCI) is eliminated
> Graphic characters are coded in the C0/C1 area (except row 0, which
is identical to 8859/1)
> Two implementation levels defined for the 'combining' characters
(called previously 'non-spacing marks' or 'floating diacritics'):
- Implementation level 1 does not allow combining marks
- Implementation level 2 allows both combining marks and precomposed
characters
> The Unified Set of the ideographic characters is defined on the BMP
> The formatting characters are included to control text in the
bidirectional data streams (for Arabic and Hebrew scripts)
> The UCS Transformation Format (UTF) is defined in the informative
annex to specify a variable-length encoding of the data avoiding C0,
C1, NUL and SPACE octets
> A claim of conformance should identify the form (UCS-2 or UCS-4), the
implementation level, and the identified subset of characters
Structure of the UCS-2:
00 FF
|-------------------------------|
00| | Alphabetics, Symbols,
| A-zone (19968 positions) | CJK auxiliary, Hangul,...
| |
| |
|-------------------------------|
4E| |
| I-zone (20992 positions) | Unified Ideographic
| |
|-------------------------------|
A0| |
| O-zone (16384 positions) | Reserved for future use
| |
|-------------------------------|
E0| | Private Use (6K), Compatibility
| R-zone (8192 positions) | Area, Arabic presentation
| | forms, Arabic ligatures, ...
|-------------------------------|
Unicode status:
===============
The Unicode 1.0 book containing non-ideographic part is completed and
available. It is published by Addison-Wesley and will be in the
bookstores by November 91.
Although 10646 (UCS-2) is based on Unicode 1.0, some differences exist.
The Unicode Technical Committee (UTC) has decided to incorporate all
adjustments with UCS-2 in Unicode 1.1, after the DIS 10646 will be
approved by the ISO ballot.
+----------------------------------------------------------------------+
| Isai Scheinberg A3/979/895/TOR |
| IBM Canada, Inc. |
| phone: (416) 448-2260 895 Don Mills Road |
| fax: (416) 448-2114 Noth York, Ontario M3C 1W3 |
| email: schein(_at_)torolab5(_dot_)vnet(_dot_)ibm(_dot_)com CANADA
|
+----------------------------------------------------------------------+