perl-unicode

Re: Japanese text search problem

2001-08-10 09:28:48
On Fri, 10 Aug 2001, Martin Duerst wrote:

At 12:17 01/08/08 -0700, Benjamin Franz wrote:

In UTF8 the 'frame' problem doesn't exist because character start
bytes _ALWAYS_ have bit eight set to 0 while continuation bytes _ALWAYS_
have bit eight set to 1. 'quotemeta' works fine if you use UTF8 as your
working encoding.

Small correction: start bytes have the most significant byte as 0 or
the two most significant bytes as 11. Continuation bytes have the two
most significant bytes as 10.

Right. I got sloppy (fortunately not while actually writing code) - I
blame fatigue. :)

The self-framing property remains valid.

-- 
Benjamin Franz

  Programs must be written for people to read, and only
  incidentally for machines to execute.
                             ---Abelson and Sussman

<Prev in Thread] Current Thread [Next in Thread>