perl-unicode

Re: Japanese text search problem

2001-08-09 23:46:51
At 12:17 01/08/08 -0700, Benjamin Franz wrote:

Oh, yeah. I forgot about that since I don't normally keep stuff in
JIS/SJIS/EUC-JP once I've acquired it. I always make my working store
UTF8. In UTF8 the 'frame' problem doesn't exist because character start
bytes _ALWAYS_ have bit eight set to 0 while continuation bytes _ALWAYS_
have bit eight set to 1. 'quotemeta' works fine if you use UTF8 as your
working encoding.

Small correction: start bytes have the most significant byte as 0 or
the two most significant bytes as 11. Continuation bytes have the two
most significant bytes as 10.

Regards,   Martin.

<Prev in Thread] Current Thread [Next in Thread>