Re: filtering out non-Japanese

perl-unicode

[Top] [All Lists]

Re: filtering out non-Japanese

2004-12-15 03:30:10

from [John Delacour]

[Permanent Link]

At 10:22 am +0100 15/12/04, Marco Baroni wrote:

I have a long text ostensibly in utf-8, and I would like to get ridof all the lines that contain anything BUT kanji, katakana orhiragana (thus, throwing away Latin, but also digits, punctuation,etc.)

There's probably a better way to do it but here I print onlycharacters in the hiragana range or the 0-9 range:



use encoding "UTF-8";
$line = "123_.latin,\x{30AA}fran\x{00E7}ais";
for (split //, $line) {
        m~[\x{3041}-\x{30ff}]|[0-9]~ and print;
}

JD

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
filtering out non-Japanese, Marco Baroni filtering out non-Japanese: ERRATA CORRIGE!, Marco Baroni Re: filtering out non-Japanese, John Delacour <= Re: filtering out non-Japanese, Marco Baroni Re: filtering out non-Japanese, John Delacour Re: filtering out non-Japanese, John Delacour Re: filtering out non-Japanese, Larry Wall Re: filtering out non-Japanese, Paul Bijnens

Previous by Date:	filtering out non-Japanese: ERRATA CORRIGE!, Marco Baroni
Next by Date:	Re: filtering out non-Japanese, Marco Baroni
Previous by Thread:	filtering out non-Japanese: ERRATA CORRIGE!, Marco Baroni
Next by Thread:	Re: filtering out non-Japanese, Marco Baroni
Indexes:	[Date] [Thread] [Top] [All Lists]