Marco,
If you need to convert HTML to ascii you could always use Lynx, but
there are a couple of problems. Lynx does not work as a filter, so
you'll have to do something like
lynx -dump file.html > file.txt
which could be wrapped in a shell script in order to accept standard
input. The second and worse problem is that Lynx indents text, which
is probably not what you want.
There are other solutions, like sed and perl scripts. My favourite is
a small C program though, html2txt, written by Wolfgang Ortmann:
http://pandora.inf.uni-jena.de/p/d/noo/html2txt.html
The webpage is helpfully written in German :-) but there are English
comments in the sourcefile. There is also a Linux binary included in
the package. I suggest that you compile it yourself, however, because
then you can easily adapt the lex sourcefile (html2txt.l) to è, ì, ò
and other Italian letters. Just look at the source, copy some German
umlauts, then paste and adjust and enjoy.
And this is what I did with my $HOME/.procmailrc :
HTML2TXT=/usr/local/bin/html2txt
:0 BfbW
* ^(<html>|<!doctype html)
|${HTML2TXT}|${FORMAIL} -f -A "X-Converted-To-Plain-Text: by html2txt"
Hope this helps. At least it works for me. Drop me a mail if you need
any help in adapting the html2txt sourcefile.
Regards,
Per
--
Per Sandström, Stockholm <psand(_at_)telia(_dot_)com> PGP and GnuPG available
http://www.keyserver.net:11371/pks/lookup?op=index&search=psand(_at_)telia
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail