procmail
[Top] [All Lists]

Stripping text/html

2003-02-17 00:04:32

# This recipe strips the text/html portion of a multipart message before
# passing it to the rest of the procmailrc for filtering and spam checks. # It was tested against a body of only 17 multipart messages (all I have)
# so please sandbox it.  A possible improvement might be to strip the
# text/plain instead and pipe the html portion through lynx -dump,
# since many plain text portions consists of "You're mail reader doesn't
# understand text/html" type messages.

# some spam check have been done previously, including discarding
# text/html messages with no alternates and single body block base64

:0
* ^Content-type:(.*\<)multipart.*boundary="\/.*[^"]
{
  BOUNDARY=$MATCH
  :0 Bfw
  * ^Content-type:(.*\<)text/html
  | sed -e "/Content-Type: text\/html;/,/$BOUNDARY/d"

# I probably don't need to nest this anymore, I'd done it
# initially so that I could put some logging in this portion
# I left it because it makes syntactic sense to group these
# together anyway.

  :0 A
  {
        :0 fw
        | formail -i"Content-Type: text/plain" \
                  -i"X-HTML: Altered text/html to text/plain"

# The secondary check for the $BOUNDARY is simply to catch any
# spurious $BOUNDARY matches left behind.  I'm not sure it should
# ever trip on a well-formatted message, but I did have one message
# that left behind a $BOUNDARY without this check

        :0 fw
        | sed -e "/$BOUNDARY/,/Transfer-Encoding/d" \
              -e "/$BOUNDARY/d" \
              -e "/This is a multipart message in MIME format/d"
  }
}

# Rest of procmailrc sorts the message into its normal destination.

**

I don't know why I spent as much time on this as I did, especially since under normal circumstances in my recipes this filter will *NEVER* get triggered. Still, it was interesting and maybe it will be of use to someone else.

--
Man is born free, but is everywhere in chains.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>
  • Stripping text/html, LuKreme <=