=?Iso-8859-1?Q?Keld_j=F8rn_simonsen?= <keld(_at_)dkuug(_dot_)dk> writes:
On Sun, Jan 07, 2001 at 10:46:02AM +0000, nick(_at_)ing-simmons(_dot_)net
wrote:
Keld,
As you may be aware we are adding suuport for UTF-8 encoded Unicode
to perl5. This is finally coming together. So now we need mechanism
to translate other encodings into and out of Unicode.
I was not aware of that. Could you give me a pointer to the spec?
The spec is a little sketchy but the main documentation we have
to date can be found as:
http://www.perldoc.com/perl5.7/pod/perlunicode.html
Do you mean unicode or do you mean ISO 10646?
I am not an expert on the differences. Perl characters are now "logically"
(up to at least) 32-bit values held internally as UTF-8 encoded strings.
The language visible properties (case, alpha-ness, digit-ness, ...)
are derived from the tables at ftp.unicode.org - the 3.0.1 version.
The tables there seem to be suitable for my/our purposes.
So I have a few questions:
0. Is use/redistribution of these tables in OpenSource projects
permitted?
Yes, they are
Excellent.
1. Is the format formally defined anywhere?
It seems straight forward enough.
The format is defined in the POSIX-2 standard ISO/IEC 9945-2:1993.
(Aka IEEE 1003.2).
2. Are the data actively maintained?
Yes, by me, and submissions I get. I am a little slow at times, tho.
3. Are in cultreg and i18n charmaps "identical"
No, i18n are more up to date. But cultreg are official ISO.
They are very syncronized, however.
There is also (I discovered via another web page) a WG15 tree.
--
Nick Ing-Simmons <nik(_at_)tiuk(_dot_)ti(_dot_)com>
Via, but not speaking for: Texas Instruments Ltd.