namazu-users-en
[Top] [All Lists]

[Namazu-users-en] Re: mknmz notworkingforJapanese languagedocuments ?

2006-06-29 18:51:23
What we can find is dependant on the character encoding  setting used by 
the browser doing the search.
The documents we built the index from are very likely to be using 
several different Japanese character encodings. (ex. Shift_JIS, EUC-JP).

I've not used the perl modules, but I can tell you what I do on a site
that isn't native EUC.

For indexing an English UTF8 site I use:
  mknmz --indexing-lang=en.UTF-8 -e ...

For indexing a Japanese UTF8 site I use (the -k means use kakasi):
  mknmz --indexing-lang=ja.UTF-8 -k -e ...

For searching (I'm using PHP module by the way) I convert the search
keywords to EUC:
  $kw_euc=mb_convert_encoding($kw,"EUC-JP","UTF8");

Then do the search, then for each search hit I convert the result back
from EUC to UTF8 ready for display, e.g.:
  $title=mb_convert_encoding(
        nmz_result_field($hlist,$n,'subject'),
        'UTF8','EUC-JP');

Darren
_______________________________________________
Namazu-users-en mailing list
Namazu-users-en(_at_)namazu(_dot_)org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-en

<Prev in Thread] Current Thread [Next in Thread>