perl-unicode

Re: Source data for perl encodings

2001-01-07 18:53:53
On Sun, Jan 07, 2001 at 10:46:02AM +0000, nick(_at_)ing-simmons(_dot_)net wrote:
Keld,

As you may be aware we are adding suuport for UTF-8 encoded Unicode
to perl5. This is finally coming together. So now we need mechanism
to translate other encodings into and out of Unicode.

I was not aware of that. Could you give me a pointer to the spec?
Do you mean unicode or do you mean ISO 10646?

Initially I just grabbed what Sun/Scriptics/Ajuba/... had used for Tcl
(because it was to hand). I have also looked at GNU iconv, IBM ICU
and XFree86 4.*.
None so far has been ideal for embedding in perl itself. Either 
the origin is not documented, they come with extra things we do not 
need or are monolithic. 

I have a prototype of our own "engine" which can translate one 
single/multi-byte encoding to another but need good tables 
to drive it. 

So I have been looking for "authoritative" tables - and starting 
a web search from your name from rfc1345 came across:

ftp://dkuug.dk/cultreg
in particular
and then
ftp://dkuug.dk/i18n

The tables there seem to be suitable for my/our purposes.
So I have a few questions:

0. Is use/redistribution of these tables in OpenSource projects
   permitted?

Yes, they are

1. Is the format formally defined anywhere?
   It seems straight forward enough.

The format is defined in the POSIX-2 standard ISO/IEC 9945-2:1993.
(Aka IEEE 1003.2).

2. Are the data actively maintained?

Yes, by me, and submissions I get. I am a little slow at times, tho.

3. Are in cultreg and i18n charmaps "identical"

No, i18n are more up to date. But cultreg are official ISO.
They are very syncronized, however.

I also welcome suggestions as to other resources that may be 
available - particularly for asian encodings and IPA.

I do not have a good suggestion for asian encodings and IPA.

Kind regards
keld