ietf-822
[Top] [All Lists]

Universal Code Set (UCS) update

1991-10-22 07:30:51
Here is an update on the Unicode/10646 merger that has been posted
to several news groups.
---------
Date: Fri, 18 Oct 91 14:59:06 EDT
From: schein(_at_)TOROLAB5(_dot_)vnet(_dot_)ibm(_dot_)com
Subject: Universal Code Set (UCS) update



The update on the Unicode/10646:

A significant progress was made during the recent ISO work group
(SC2/WG2) meetings in Geneva (August 91) and Paris (October 91). The
interest was very high with 30 people (representing 13 countries)
attending the meetings. A broad consensus was reached on all technical
issues (see details below).

ISO SC2 plenary (Rennes, October 91) unanimously authorized WG2 to issue
a new DIS 10646 in January 1992 for a 4-month vote. It is expected that
the 2nd DIS will be approved and the International Standard (IS 10646)
will be completed by 3Q 92.


2nd DIS 10646 (UCS) contents:
=============================


Architecture:

  > 4-byte canonical form (UCS-4)

  > 2-byte Base Multilingual Plane on Plane 0 (UCS-2)

    - No characters are currently defined on any other planes
    - No 'swapping' of the other planes is defined
    - Compaction methods 1, 3, and 5 are eliminated
    - Single Graphic Character Introducer (SGCI) is eliminated

  > Graphic characters are coded in the C0/C1 area (except row 0, which
    is identical to 8859/1)

  > Two implementation levels defined for the 'combining' characters
    (called previously 'non-spacing marks' or 'floating diacritics'):

    - Implementation level 1 does not allow combining marks
    - Implementation level 2 allows both combining marks and precomposed
      characters

  > The Unified Set of the ideographic characters is defined on the BMP

  > The formatting characters are included to control text in the
    bidirectional data streams (for Arabic and Hebrew scripts)

  > The UCS Transformation Format (UTF) is defined in the informative
    annex to specify a variable-length encoding of the data avoiding C0,
    C1, NUL and SPACE octets

  > A claim of conformance should identify the form (UCS-2 or UCS-4), the
    implementation level, and the identified subset of characters


Structure of the UCS-2:


      00                            FF
      |-------------------------------|
    00|                               |   Alphabetics, Symbols,
      |   A-zone (19968 positions)    |   CJK auxiliary, Hangul,...
      |                               |
      |                               |
      |-------------------------------|
    4E|                               |
      |   I-zone (20992 positions)    |   Unified Ideographic
      |                               |
      |-------------------------------|
    A0|                               |
      |   O-zone (16384 positions)    |   Reserved for future use
      |                               |
      |-------------------------------|
    E0|                               |   Private Use (6K), Compatibility
      |   R-zone (8192 positions)     |   Area, Arabic presentation
      |                               |   forms, Arabic ligatures, ...
      |-------------------------------|



Unicode status:
===============

The Unicode 1.0 book containing non-ideographic part is completed and
available. It is published by Addison-Wesley and will be in the
bookstores by November 91.

Although 10646 (UCS-2) is based on Unicode 1.0, some differences exist.
The Unicode Technical Committee (UTC) has decided to incorporate all
adjustments with UCS-2 in Unicode 1.1, after the DIS 10646 will be
approved by the ISO ballot.



 +----------------------------------------------------------------------+
 |  Isai Scheinberg                         A3/979/895/TOR              |
 |                                          IBM Canada, Inc.            |
 |  phone: (416) 448-2260                   895 Don Mills Road          |
 |  fax:   (416) 448-2114                   Noth York, Ontario M3C 1W3  |
 |  email: schein(_at_)torolab5(_dot_)vnet(_dot_)ibm(_dot_)com     CANADA       
               |
 +----------------------------------------------------------------------+



<Prev in Thread] Current Thread [Next in Thread>
  • Universal Code Set (UCS) update, Walt Daniels <=