procmail
[Top] [All Lists]

Re: About HTML email recipe in pm-tips

2002-05-12 01:24:06
Marco,

If you need to convert HTML to ascii you could always use Lynx, but 
there are a couple of problems. Lynx does not work as a filter, so 
you'll have to do something like 

lynx -dump file.html > file.txt

which could be wrapped in a shell script in order to accept standard
input. The second and worse problem is that Lynx indents text, which 
is probably not what you want.

There are other solutions, like sed and perl scripts. My favourite is 
a small C program though, html2txt, written by Wolfgang Ortmann:

http://pandora.inf.uni-jena.de/p/d/noo/html2txt.html

The webpage is helpfully written in German :-) but there are English 
comments in the sourcefile. There is also a Linux binary included in
the package. I suggest that you compile it yourself, however, because 
then you can easily adapt the lex sourcefile (html2txt.l) to è, ì, ò 
and other Italian letters. Just look at the source, copy some German 
umlauts, then paste and adjust and enjoy.

And this is what I did with my $HOME/.procmailrc :

HTML2TXT=/usr/local/bin/html2txt

:0 BfbW
* ^(<html>|<!doctype html)
|${HTML2TXT}|${FORMAIL} -f -A "X-Converted-To-Plain-Text: by html2txt"

Hope this helps. At least it works for me. Drop me a mail if you need 
any help in adapting the html2txt sourcefile.

Regards,
Per

--
Per Sandström, Stockholm <psand(_at_)telia(_dot_)com>    PGP and GnuPG available
http://www.keyserver.net:11371/pks/lookup?op=index&search=psand(_at_)telia

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail