Re: Follow: Recipe woes

Howdy.  Thanks for the reply.

On Sun, 1 Jun 2003, Dallman Ross wrote:

On Sat, May 31, 2003 at 11:28:33PM -0500, Justin Shore wrote:

I can't figure out a regex way to remove all occurences of
"***SPAM***" from the Subject line.  This happens when I bounce a
piece of spam that SpamAssassin has already marked up to a spamtrap
address anymore.  My only fix is to check for that string twice.  A
regex fix for this would be handy.


One assumes you are sending the mail to SA higher up in your rc.
There are several considerations here.  One, why don't you catch
the value of Subject: before SA ever sees the mail and changes it?
Two, there are options to stop SA from writing to the Subject:
line.  Perhaps you should use them?  Three, SA's -d option, or
the razor-report program's submit option, can strip the ***SPAM***
stuff out.  But I'd go with No. 1 here, because you hav an
unaltered piece of mail, which you then alter with SA, then try
to unalter with some other tool inside procmail.  Seems like
a counterproductive way to go.


I should have said it earlier.  I'm calling SA from MIMEDefang.  I've
found a number of uses for that method (plus it's good to know how to use
it for the next time you set up a mailhub and have to do it that way).  
In case you aren't familar with how SA works with MD, SA can't make any
changes to the message it's checking.  Best it can do is report back to MD
what the results were and let MD make the neccessary changes.  MD makes
all the changes long before the LDA gets its mits on the mail.  SA -d
doesn't strip all the changes I make in the headers.  It finds all the
X-Spam lines just fine, even though they are in a slightly different
format than what SA would write.  It doesn't however find the "***SPAM***"  
I put on the Subject line.  I believe they used 5 asterisks on each side 
of "SPAM" whereas I only use 3.  Trivial change as face value but 
unfortunately I rolled it out like that without realizing what the default 
SA string was or the consequences of doing it in a different way.  Even if 
SA -d could remove an instances of "***SPAM***", can it remove more than 
one instance of it?

So simply capture SUBJECT on all your mail up-top in your rc.  Then,
when a particular piece is spam, you already have the value to send
in your custom report action.

That said, if you really want to remove the lines, you could use sed:

      SUBJECT = "`echo $SUBJECT | sed 's/\*\*\*SPAM\*\*\*//g'`


That would probably work.  I wasn't sure if procmail would let me work 
with a variables in that fashion.  Do you know of a regex way to remove 
any number of occurences of "***SPAM***" separated by a spam on that line?

Along the same lines I can't figure out how to apply that regex to
$SUBJECT that I initially match.  I use it later when I forward mail
to the FTC and NANAS.  Any tips on this that take into account the
problem of multiple ***SPAM*** strings would be gladly accepted.


You should be using the -d flag to SA to strip out changes to the
headers before you make reports.


MD complicates things, doesn't it.? :)

# Extract subject and assign it to SUBJECT
:0
* ^Subject:[        ]*\/[^  ].*
{
        SUBJECT=$MATCH
}

## Report spam to Pyzor, Razor, the FTC, and NANAS.
:0 BH
# Hopefully this will prevent mail loops.
* $ ! ^X-Spam-Loop: $BOUNCER
* !   ^FROM_DAEMON
{
    :0f
    # Clean up the spam by removing the SA headers, Subject change
    # and other misc headers.
    | spamassassin -d \
    | sed -e "s/^Subject: \*\*\*SPAM\*\*\*/Subject:/" \
          -e "s/^Subject: \*\*\*SPAM\*\*\*/Subject:/" \


Use the g option to sed.  Anyway, you have already bothered to
extract SUBJECT above, so do your operation on that, not on a
newly extracted copy of the same line that you have to parse
the whole message again for.


These lines are only making changes to the headers in the messages I 
directly report to *zor.  I have to make my Body changes elsewhere or *zor 
will generate invalid hashes.  Since I"m not reconstructing the message, 
only removing the SA markup and bouncing lines, I don't have an 
opportunity to use $SUBJECT yet.  That's later.

It might be preferable for you to run SA without having it change
headers at all.  That's how I run it (for some value of the infrequent
times that I end up running SA, since my own recipes catch the spam all
but maybe once a fortnight).  I think you can start by putting this in
your user_prefs file:

      rewrite_subject       0


My thinking is once I have this working on my personal box, I'll either
move the auto-reporting to an actual MTA for auto-reporting or forward
mail scoring >= X to my personal box for auto-reporting.  I like the
latter because it lets me munge the domain and IP information of my users
out of the headers entirely.  This is a plus.  That said I can't really 
run different instances of SA for different tasks on the same box.  That 
box will be running SA for everyone and I'll be gleaning the bad spam off 
the top.  What I'm running through SA now on my personal box is spamtrap 
mail so I literally report everything they receive.  I'll have to check 
the SA score on either the sending or receiving end (or both more than 
likely) when I forward mail from one of the production MTAs here.

But here is another way:


       :0 W  # this is an assignment recipe, not a delivering one
        SA_OUT=| /users/zconcept/bin/spamc -c

       :0 e  # if exit status warrants, tag as spam
        { RX = SpamAssassin }

       :0 E:  # check for possible spamc failure
        * SA_OUT ?? ^^0/0^^
        spamc_failure

Now the original message is untouched, but procmail knows it's spam.  I
have used this method for many months.  Btw, the spamc_failure check is
there simply for completion's sake; but I have never, ever seen it get
invoked.


That's a good idea.  Still I'm calling SA from MD.  I should have said 
that earlier.  I actually sent the followup before I was done with it but 
it was mostly complete so I left it as is.

I'm not forwarding mail with a score >= 10 from a production MTA to a 
single mailbox on this machine.  It took a while to make it work believe 
it or not.

:0
{
        :0c
        * ^X-Spam-Score: \*\*\*\*\*\*\*\*\*\*.*
        {
                :0c:
                $SPAM_DIR/spam-score-10/$YYYY$MM
 
                :0
                ! spamhole(_at_)domain(_dot_)net
        }
 
        :0c:   
        * ^X-Spam-Score: 
(\*\*\*\*\*|\*\*\*\*\*\*|\*\*\*\*\*\*\*|\*\*\*\*\*\*\*\*|\*\*\*\*\*\*\*\*\*) 
\(.*
        $SPAM_DIR/spam-score-5/$YYYY$MM
} 


This is from the system-wide procmailrc.  It archived mail scoring >=5<10
in one place in a spool named via YYYYMM.  Ditto for mail scoring >=10.  
These archiving checks should still pass the spam on to the users.  This
was just for intial testing purposes and I never removed it.  What stumped
me was where to put the clone flag.  I finally found a FAQ example that 
was tipped me off to the braces.

The recipient of this spam isn't doing anything with it automatically.  
Since the copy of SA running on that production MTA is old (2.43 IIRC), I 
want to verify by hand that each piece of mail scoring that high is not a 
FP.  Just in case.  I will then bounce to my local auto-reporting account 
when I'm ready.  

It's a messed up hack at the moment.  Hopefully I'll be able to hone it 
and make it all work soon.  Thanks for the reply.

Justin


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail