On Monday 6 October 97, at 15 h 24, the keyboard of
wwgrol(_at_)sparc01(_dot_)fw(_dot_)hac(_dot_)com (W. Wesley Groleau x4923)
wrote:
I would like to test a method to differentiate English, Spanish, German,
and French. Anyone who is able to provide a good-sized "typical"
collection of messages in Spanish, French, or German in
There are a lot of complete dictionary in many languages in every Crack
archive :-)
For French, if you want a sample of real texts, you can use the ABU
library:
ftp://ftp.cnam.fr/pub/ABU
But, since they are only public-domain texts, they are more
representative of the written French of one century ago than of the
"spoken" French of the typical email message.
You can also check the archives of a daily newspaper like l'Humanité :
ftp://ftp.internatif.org/humanite/archives/
The method is based on letter frequencies. It is very inefficient, but
Why not using an algorithm based on word frequencies?