procmail
[Top] [All Lists]

Re: Auto-identifying languages - samples needed.

1997-10-07 02:28:12
[Re: French Word List for Email exclusion]

ftp://ftp.cnam.fr/pub/ABU

But, since they are only public-domain texts, they are more 
representative of the written French of one century ago than of the
"spoken" French of the typical email message.

One way might be to gather a large number of known French newsgroup 
texts  etc. and run PERL script over them to accumlate a frequency 
table.

I think the important thing is to pick common French words that will 
NOT turn up in the languages of other messages...

I suppose by buiding an exclusion list betwen tables from various 
languages would work. Perl again : )



-
_____________________________________________________
[   Rockland Gate Systems   ]
[ == * Paul  A. Castro * == ]
[==* http://rgs.aub.com * ==]

<Prev in Thread] Current Thread [Next in Thread>