Re: Filtering bounces for auto-wording recipes

On Thu, 25 Sep 2003, Professional Software Engineering wrote:

Uhm, you're aware that some of those "random character spews" (either at 
the end of a subject, or trailing the body) are sometimes used as database 
keys?


Yes.  Not to mention the URL query strings.  Unfortunately I'm not aware 
of anything I can do to stop them.  I was wondering today if anyone had 
come up with a regex to strip off URL query strings.

Well, one approach would be to take recognized mailer-daemon messages and 
file them away for potential carbon lifeform review.


That's possible.  I do believe I've archived all the mail sent to the 
catchall address.  I could go through it but I'm not sure if I would be 
able to devise regexs that worked good enough for the task at hand.  I 
might give it a try though.

.. and a good reason to consider _NOT_ automating reporting mechanisms, 
since when they break, they're become a big problem, often setting yourself 
up to be ignored.


Unfortunately there's no other feasible way to run a network of spamtraps 
that fields thousands of pieces of spam per day.  I don't have the time to 
check each piece of mail before approving it unfortunately.  I'm not paid 
for my contribution.  I'm just trying to find a good solution.  If I can 
address this problem then I think I just might have it licked.

Some crappy mailers (predominantly, but not exclusively running on windows 
OS') don't send daemon messages from recognized mailers.


You might consider scanning the body for mail-type headers, which would 
typically be included in a bounce.  That's not guaranteed to catch all the 
daemon messages, but it should help to grab the ones which bounce back 
transaction headers within the body.


That's a good idea.  I might try that on the corpus of bounces I have
laying around here somewhere.  I really wish people wouldn't do that.  
It's such a pain in the ass.

FTR, 'H' is a default flag, so you don't need to specify it.  See 'man 
procmailrc'


It's just a place holder to make sure I don't forget.  Sort of a keep my 
sanity measure.

This certainly won't catch anything if it's commented out.


It didn't work right when it wasn't commented out either.  Neither of them 
did which was very disheartening.

Might be a bit easier if you use a regexp, like so (note also that dots in 
the LHS of the expression are ESCAPED).  This could be optimized further by 
grouping .com/.net/.org together, but it is so not worth my time to do that 
on a munged string:

-e 
's/(munge1\.net|munge2\.org|munge3\.net|munge4\.com|munge5\.net|munge6\.net|munge7\.net|munge8\.com)/reportingdomain.com/gI'
 
\
-e 
's/(mungeuser1|mungeduser2|mungeduser3|mungeduser4|mungeduser5|mungeduser6)/mungeduserid/gI'
 
\


I really need to get better with regexs.  I just need to devise a way to 
practice with them I guess.  Yes, this would definitely be better.  Thanks

# Hopefully this will prevent mail loops.
* $ ! ^X-Spam-Loop: $BOUNCER


Hopefully, if you're going to bother with all the munging, you'd consider 
checking for this BEFOREHAND, so that you don't do all that extra work if 
the message is eventually going to be ignored.


I don't know why I did it that way.  It's probably due to the way the 
script evolved from a basic munge and forward script to the mess it is 
today.  I'll fix that.  Thanks again

        | sed -e "s/munge1.net/reportingdomain.com/gI" \

[snip, but even more expressions than before]

You're munging again?


Yes.  The first was to munge just the Subject.  There is a string in the 
subject (***SPAM***) that I need removed before I submit the spam to Pyzor 
and Razor.  Now that I think about it I really shouldn't be removing 
anything from the subject that I or my milter didn't add so I should 
remove those other regexs and do them later.  Both the subject and body 
needs munging in the end though.  And the munges subject needs to be used 
when the spam is forwarded.  I can somewhat follow my logic from what I 
did many months ago.  I need to put some more thought into it.

                | $FORMAIL -I ReSent-Date: \

[snip]

If you're going to remove these headers, consider removing them BEFORE you 
do all the regexp changes, since this will reduce the volume of material 
which the other expressions must reprocess over and over.


That's a good idea as well.  I should have done that from the beginning.

I monitored my system load today during an incoming spam run.  
Spamassassin chewed up the most CPU time of all the processes involved in 
the script.  Calling spamassassin -d to remove the SA markup was 
surprisingly expensive.  I'm going to try removing the markup with sed 
when I get a chance.  Sed seemed to be much less expensive overall.  Also, 
reporting spam via spamassassin -r was also very expensive.  I updated 
Razor and applied a patch to it from the SA folks.  It seems to be less 
CPU intensive now.  That spam run was dieing down about that time though.

Thanks for the ideas.  I'll see if I can find any common threads among the 
bounces that weren't picked up by FROM_DAEMON and FROM_MAILER.  Hopefully 
I can find a way to eliminate them.  Thanks again

Justin


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail