procmail
[Top] [All Lists]

RE: how to strip SA's message markup?

2004-02-22 09:02:23

I should add, that I had this at the top of the procmail recipe:
DEFAULT=|

without that, all the cleaned up spam will be delivered to your mailbox. <g>

Thus, the complete recipe:

# ---------------- remove_sa_markup.rc -----------------
DEFAULT=|
SPACE=" "
TAB="   "
WS="$SPACE$TAB"

:0 B
* $ H ?? ^Content-Type:[$WS]+multipart/mixed;
* $ ^Content-Type:[$WS]+message/rfc822;[$WS]+x-spam-type=original
* $ ^Content-Description:[$WS]+original message before SpamAssassin
* $ ^Content-Disposition:[$WS]+attachment
{
# SA markup is present, pick up the boundary
:0
* $ ^Content-Type:[$WS]+multipart/mixed;[$WS]+boundary=\"\/[^\"]*
{ BOUNDARY = "--$MATCH" }

# sed script to pull out the original message (yikes)
# Technically, '.'s inside boundary should be escaped,
# but we assume that the rest of the chars. in the boundary
# string are sufficiently unique to ensure a correct
# match.
SED_GET_MSG_PART='
1{h;d}
2,/^'"$BOUNDARY"'$/d
/^'"$BOUNDARY"'$/,/^'"$BOUNDARY"'--$/{
 /^'"$BOUNDARY"'$/,/^$/d
 /^'"$BOUNDARY"'--$/d
 H;d}
$!d
/^$/!H
x'

# Run it through sed to remove the markup.
:0 hbfw
| sed -e "$SED_GET_MSG_PART"
}


This can be run against an mbox as follows:
  formail -s procmail $HOME/scripts/remove_sa_markup.rc < spam.mbox >
spam_no_markup.mbox



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail