perl-unicode
|
Re: filtering out non-Japanese2004-12-15 03:30:10At 10:22 am +0100 15/12/04, Marco Baroni wrote: I have a long text ostensibly in utf-8, and I would like to get rid of all the lines that contain anything BUT kanji, katakana or hiragana (thus, throwing away Latin, but also digits, punctuation, etc.) There's probably a better way to do it but here I print only characters in the hiragana range or the 0-9 range: use encoding "UTF-8"; $line = "123_.latin,\x{30AA}fran\x{00E7}ais"; for (split //, $line) { m~[\x{3041}-\x{30ff}]|[0-9]~ and print; } JD
|
|