I should add, that I had this at the top of the procmail recipe:
DEFAULT=|
without that, all the cleaned up spam will be delivered to your mailbox. <g>
Thus, the complete recipe:
# ---------------- remove_sa_markup.rc -----------------
DEFAULT=|
SPACE=" "
TAB=" "
WS="$SPACE$TAB"
:0 B
* $ H ?? ^Content-Type:[$WS]+multipart/mixed;
* $ ^Content-Type:[$WS]+message/rfc822;[$WS]+x-spam-type=original
* $ ^Content-Description:[$WS]+original message before SpamAssassin
* $ ^Content-Disposition:[$WS]+attachment
{
# SA markup is present, pick up the boundary
:0
* $ ^Content-Type:[$WS]+multipart/mixed;[$WS]+boundary=\"\/[^\"]*
{ BOUNDARY = "--$MATCH" }
# sed script to pull out the original message (yikes)
# Technically, '.'s inside boundary should be escaped,
# but we assume that the rest of the chars. in the boundary
# string are sufficiently unique to ensure a correct
# match.
SED_GET_MSG_PART='
1{h;d}
2,/^'"$BOUNDARY"'$/d
/^'"$BOUNDARY"'$/,/^'"$BOUNDARY"'--$/{
/^'"$BOUNDARY"'$/,/^$/d
/^'"$BOUNDARY"'--$/d
H;d}
$!d
/^$/!H
x'
# Run it through sed to remove the markup.
:0 hbfw
| sed -e "$SED_GET_MSG_PART"
}
This can be run against an mbox as follows:
formail -s procmail $HOME/scripts/remove_sa_markup.rc < spam.mbox >
spam_no_markup.mbox
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail