# This recipe strips the text/html portion of a multipart message before
# passing it to the rest of the procmailrc for filtering and spam
checks.
# It was tested against a body of only 17 multipart messages (all I
have)
# so please sandbox it. A possible improvement might be to strip the
# text/plain instead and pipe the html portion through lynx -dump,
# since many plain text portions consists of "You're mail reader doesn't
# understand text/html" type messages.
# some spam check have been done previously, including discarding
# text/html messages with no alternates and single body block base64
:0
* ^Content-type:(.*\<)multipart.*boundary="\/.*[^"]
{
BOUNDARY=$MATCH
:0 Bfw
* ^Content-type:(.*\<)text/html
| sed -e "/Content-Type: text\/html;/,/$BOUNDARY/d"
# I probably don't need to nest this anymore, I'd done it
# initially so that I could put some logging in this portion
# I left it because it makes syntactic sense to group these
# together anyway.
:0 A
{
:0 fw
| formail -i"Content-Type: text/plain" \
-i"X-HTML: Altered text/html to text/plain"
# The secondary check for the $BOUNDARY is simply to catch any
# spurious $BOUNDARY matches left behind. I'm not sure it should
# ever trip on a well-formatted message, but I did have one message
# that left behind a $BOUNDARY without this check
:0 fw
| sed -e "/$BOUNDARY/,/Transfer-Encoding/d" \
-e "/$BOUNDARY/d" \
-e "/This is a multipart message in MIME format/d"
}
}
# Rest of procmailrc sorts the message into its normal destination.
**
I don't know why I spent as much time on this as I did, especially
since under normal circumstances in my recipes this filter will *NEVER*
get triggered. Still, it was interesting and maybe it will be of use
to someone else.
--
Man is born free, but is everywhere in chains.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail