Re: Japanese text search problem

on 01.8.8 1:14 AM, Benjamin Franz at snowhare(_at_)nihongo(_dot_)org wrote:

On Tue, 7 Aug 2001, Ashutosh Salgarkar wrote:

my $safe_key = quotemeta($key1);
$searchStr =~ m/$safe_key/;

is probably what you want. I am presuming you are trying to use m// to
search for exact string matches rather than exploiting the full regex
facilities.


  No.  quotemeta would not cut it.  It depends on what character set is fed
to regexes but for most (virtually all) cases, you convert strings to either
EUC-jp or utf8.  Neither EUC-jp nor utf8 contains metacharacters when you
use Japanese (or Korean or Chinese).  The problem is bit deeper.
  The problem is that before perl 5.6.x, character and byte are
interchangeable and Japanese character (Kanji as follows) takes 2 bytes on
EUC (and 3 bytes on utf8).

  For example,

  /\xd1\xf1/ and print; # I want to find a line that contains 'to bore'

  not only maches the character desired but also 'camel', which is
represented by two Kanji (4 bytes).

\xb4\xc1 \xbb\xfa
-------- --------
<RAKU>   <DA>     = a camel
    ---------
    <TEKI>        = to bore

  There are ways to overcome this character boundary problem with EUC, like
inserting delimiter character (such as beep and tab) between each Kanji but
that's way too counter-intuitive, not to mention slow.

Dan the Man with Too Many Character Sets to Fiddle

<Prev in Thread]	Current Thread	[Next in Thread>
Re: Japanese text search problem, (continued) Re: Japanese text search problem, Dan Kogai Re: Japanese text search problem, Jarkko Hietaniemi Re: Japanese text search problem, Dan Kogai Re: Japanese text search problem, Nick Ing-Simmons Re: Japanese text search problem, Andreas Marcel Riechert Re: Japanese text search problem, Dan Kogai Re: Japanese text search problem, Andreas Marcel Riechert Re: Japanese text search problem, Markus Kuhn Re: Japanese text search problem, Markus Kuhn Re: Japanese text search problem, Benjamin Franz Re: Japanese text search problem, Dan Kogai <= Re: Japanese text search problem, Benjamin Franz Message not available Re: Japanese text search problem, Martin Duerst Re: Japanese text search problem, Benjamin Franz

Previous by Date:	Re: Japanese text search problem, Benjamin Franz
Next by Date:	Re: Japanese text search problem, Andreas Marcel Riechert
Previous by Thread:	Re: Japanese text search problem, Benjamin Franz
Next by Thread:	Re: Japanese text search problem, Benjamin Franz
Indexes:	[Date] [Thread] [Top] [All Lists]