Re: Inverse of /\p{script}/

On Friday, Aug 29, 2003, at 16:07 Asia/Tokyo, Nick Ing-Simmons wrote:

Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:

On Thu, Aug 28, 2003 at 03:16:20PM +0100, nick(_at_)ing-simmons(_dot_)net wrote:

Does the existing perl5.8.* Unicode support have a way to efficently
determine which script(s) or block (in unicode sense) a code pointbelongs
to?


        use Unicode::UCD qw(charscript charblock);
        print charscript(0x0388);
        print charblock (0x30a0);


Great.


But that is not good enough for cases below because...

 (Hiragana | Katakana | Han) => 'jisx0208.1990-0'

This is very wrong because jisx0208.1990-0 only contains \p{Han} thatappears in Japanese (JIS X 0208, to be exact). On the other hand,jisx0208.1990-0 does contain greek and cyrillic alphabets.

One of so many reasons why Han Unification was a bad idea. When itcomes to Han Ideographs, Unicode's sense of charscript is almostuseless.


\x{5c0f}\x{98fc} \x{5f3e}

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: Inverse of /\p{script}/, Dan Kogai

Next by Date:

Re: Inverse of /\p{script}/, Nick Ing-Simmons

Previous by Thread:

Re: Inverse of /\p{script}/, Nick Ing-Simmons

Next by Thread:

Re: Inverse of /\p{script}/, Nick Ing-Simmons

Indexes:

[Date] [Thread] [Top] [All Lists]