Okay, here's the script that I came up with. This became mainly a 'sed' and
shell quotation problem. The basics of this could be modified to remove
only the text part of amultipart, for example, and other simple extractions.
The sed script works by (1) putting the leading From_ into the hold buffer,
(2) putting the part of the attachment that has the original into the hold
buffer, but removing the initial Content-* part of the attachment and the
trailing termination boundary. (3) all other lines are deleted, except
for the last one, unless it is non-blank (the final line in a well-formed
message should be blank). (4) at the end, the hold buffer is swapped back
into the pattern buffer, and is then output by sed. The script assumes
that the first attachment part is SA report, and the second part is the
original message. This assumption is safe for SA, but not in general.
SPACE=" "
TAB=" "
WS="$SPACE$TAB"
:0 B
* $ H ?? ^Content-Type:[$WS]+multipart/mixed;
* $ ^Content-Type:[$WS]+message/rfc822;[$WS]+x-spam-type=original
* $ ^Content-Description:[$WS]+original message before SpamAssassin
* $ ^Content-Disposition:[$WS]+attachment
{
# SA markup is present, pick up the boundary
:0
* $ ^Content-Type:[$WS]+multipart/mixed;[$WS]+boundary=\"\/[^\"]*
{ BOUNDARY = "--$MATCH" }
# sed script to pull out the original message (yikes)
# Technically, '.'s inside boundary should be escaped,
# but we assume that the rest of the chars. in the boundary
# string are sufficiently unique to ensure a correct
# match.
SED_GET_MSG_PART='
1{h;d}
2,/^'"$BOUNDARY"'$/d
/^'"$BOUNDARY"'$/,/^'"$BOUNDARY"'--$/{
/^'"$BOUNDARY"'$/,/^$/d
/^'"$BOUNDARY"'--$/d
H;d}
$!d
/^$/!H
x'
# Run it through sed to remove the markup.
:0 hbfw
| sed -e "$SED_GET_MSG_PART"
}
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail