Re: Inverse of /\p{script}/

On Fri, 2003-08-29 at 03:07, Nick Ing-Simmons wrote:

Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:

On Thu, Aug 28, 2003 at 03:16:20PM +0100, nick(_at_)ing-simmons(_dot_)net 
wrote:


Does the existing perl5.8.* Unicode support have a way to efficently 
determine which script(s) or block (in unicode sense) a code point belongs
to?


    use Unicode::UCD qw(charscript charblock);
    print charscript(0x0388);
    print charblock (0x30a0);


Great.

It seems to make sense to have a hash which maps script names to 
probable (font) encodings 

 (Hiragana | Katakana | Han) => 'jisx0208.1990-0'
 (Greek)                     => 'iso8859-7',


I dunno about script->font mappings...


That is Tk's (i.e. my) problem.
XFree86 has the font encodings bundled so I think I can pre-analysze 
them.


You might want to look at what we did for Pango - see 
pango/modules/basic/tables-big.i in
ftp://ftp.gtk.org/pub/gtk/v2.2/pango-1.2.5.tar.gz.

There is a big map there that for each Unicode codepoint lists
possible encodings with a moderately clever encoding scheme to save
memory. Then based on the current language tag (either from 
the program or from the current locale setting), there is an order
in which to try encodings.

We're dropping support for this code and for core X fonts
in the next release of Pango, but if you find it useful, feel
free to borrow the techniques, tables, generation tools, 
or table lookup code and use it under whatever license you
want.

Regards,
                                        Owen