perl-unicode

Re: Source data for perl encodings

2001-01-08 02:24:33
=?Iso-8859-1?Q?Keld_j=F8rn_simonsen?= <keld(_at_)dkuug(_dot_)dk> writes:
On Sun, Jan 07, 2001 at 10:46:02AM +0000, nick(_at_)ing-simmons(_dot_)net 
wrote:
Keld,

As you may be aware we are adding suuport for UTF-8 encoded Unicode
to perl5. This is finally coming together. So now we need mechanism
to translate other encodings into and out of Unicode.

I was not aware of that. Could you give me a pointer to the spec?

The spec is a little sketchy but the main documentation we have 
to date can be found as:

http://www.perldoc.com/perl5.7/pod/perlunicode.html

Do you mean unicode or do you mean ISO 10646?

I am not an expert on the differences. Perl characters are now "logically"
(up to at least) 32-bit values held internally as UTF-8 encoded strings. 
The language visible properties (case, alpha-ness, digit-ness, ...)
are derived from the tables at ftp.unicode.org - the 3.0.1 version.


The tables there seem to be suitable for my/our purposes.
So I have a few questions:

0. Is use/redistribution of these tables in OpenSource projects
   permitted?

Yes, they are

Excellent.


1. Is the format formally defined anywhere?
   It seems straight forward enough.

The format is defined in the POSIX-2 standard ISO/IEC 9945-2:1993.
(Aka IEEE 1003.2).

2. Are the data actively maintained?

Yes, by me, and submissions I get. I am a little slow at times, tho.

3. Are in cultreg and i18n charmaps "identical"

No, i18n are more up to date. But cultreg are official ISO.
They are very syncronized, however.

There is also (I discovered via another web page) a WG15 tree.

-- 
Nick Ing-Simmons <nik(_at_)tiuk(_dot_)ti(_dot_)com>
Via, but not speaking for: Texas Instruments Ltd.