procmail
[Top] [All Lists]

Re: HTML to ASCII recipe?

1997-01-16 21:46:27
On Wed, 15 Jan 1997, era eriksson wrote:

On Tue, 14 Jan 1997 20:38:58 -0500 (EST), Wotan <wotan(_at_)netcom(_dot_)com>
wrote on Procmail-L:
 > On Thu, 9 Jan 1997, Timothy J Luoma wrote:
 >> Someone has recently begun sending me email with text as HTML  
 >> (using Mozilla 4.0b1 (Win95; I))

 >> B) check message for HTML and convert them to plain ol ASCII
 > untested, so someone will spot a bug or two.  :)

Is it enough to remove just the HTML tags? Those messages will contain
a complete attachment with a second copy of the entire message. You
want to zap the whole attachment, no?
  By the way, you should probably be using sed -e /<[^>]*>//g instead.
Finally, I believe the wildcard before "content-type" is redundant.

 > :0
 > .*content-type: text/html
 > {
 >    :0 B
 >    | sed -e '/<.*>//g'
 > 
 >    :0
 >    Wherever
 > }

Maybe something like:

    :0fbw
    content-type: text/html
    | sed -e "/^[Cc]ontent-[Tt]ype: text/html/,/^--/d"

This will still leave some traces of the attachment header and footer,
but remove all of the body and most of the headers. If you want a
complete solution, I'd write up a Perl script or something. 

Filtering through perl or a sed script would probably be best.  :-)  I've
seen the content-type line in the actual headers of the e-mail, so your
solution might not catch everything.  

For simplicity, I'm just going to bounce all mail with html in it and
include instructions for eliminating this nonsense.

-- 
God must love the Common Man; He made so many of them.

<Prev in Thread] Current Thread [Next in Thread>