procmail
[Top] [All Lists]

Re: HTML to ASCII recipe?

1997-01-15 04:08:25
On Tue, 14 Jan 1997 20:38:58 -0500 (EST), Wotan <wotan(_at_)netcom(_dot_)com>
wrote on Procmail-L:
On Thu, 9 Jan 1997, Timothy J Luoma wrote:
Someone has recently begun sending me email with text as HTML  
(using Mozilla 4.0b1 (Win95; I))

Just like the very clueful unsubscribe sent to the list not much later
than Wotan's message. I love it! Humorous, aesthetical, less filling!

B) check message for HTML and convert them to plain ol ASCII
untested, so someone will spot a bug or two.  :)

Is it enough to remove just the HTML tags? Those messages will contain
a complete attachment with a second copy of the entire message. You
want to zap the whole attachment, no?
  By the way, you should probably be using sed -e /<[^>]*>//g instead.
Finally, I believe the wildcard before "content-type" is redundant.

:0
.*content-type: text/html
{
     :0 B
     | sed -e '/<.*>//g'

     :0
     Wherever
}

Maybe something like:

    :0fbw
    content-type: text/html
    | sed -e "/^[Cc]ontent-[Tt]ype: text/html/,/^--/d"

This will still leave some traces of the attachment header and footer,
but remove all of the body and most of the headers. If you want a
complete solution, I'd write up a Perl script or something. 

/* era */

-- 
See <http://www.ling.helsinki.fi/~reriksso/> for mantra, disclaimer, etc.
* If you enjoy getting spam, I'd appreciate it if you'd register yourself
  at the following URL:  <http://www.ling.helsinki.fi/~reriksso/spam.html>

<Prev in Thread] Current Thread [Next in Thread>