On Thu, 25 Sep 2003, Professional Software Engineering wrote:
Uhm, you're aware that some of those "random character spews" (either at
the end of a subject, or trailing the body) are sometimes used as database
keys?
Yes. Not to mention the URL query strings. Unfortunately I'm not aware
of anything I can do to stop them. I was wondering today if anyone had
come up with a regex to strip off URL query strings.
Well, one approach would be to take recognized mailer-daemon messages and
file them away for potential carbon lifeform review.
That's possible. I do believe I've archived all the mail sent to the
catchall address. I could go through it but I'm not sure if I would be
able to devise regexs that worked good enough for the task at hand. I
might give it a try though.
.. and a good reason to consider _NOT_ automating reporting mechanisms,
since when they break, they're become a big problem, often setting yourself
up to be ignored.
Unfortunately there's no other feasible way to run a network of spamtraps
that fields thousands of pieces of spam per day. I don't have the time to
check each piece of mail before approving it unfortunately. I'm not paid
for my contribution. I'm just trying to find a good solution. If I can
address this problem then I think I just might have it licked.
Some crappy mailers (predominantly, but not exclusively running on windows
OS') don't send daemon messages from recognized mailers.
You might consider scanning the body for mail-type headers, which would
typically be included in a bounce. That's not guaranteed to catch all the
daemon messages, but it should help to grab the ones which bounce back
transaction headers within the body.
That's a good idea. I might try that on the corpus of bounces I have
laying around here somewhere. I really wish people wouldn't do that.
It's such a pain in the ass.
FTR, 'H' is a default flag, so you don't need to specify it. See 'man
procmailrc'
It's just a place holder to make sure I don't forget. Sort of a keep my
sanity measure.
This certainly won't catch anything if it's commented out.
It didn't work right when it wasn't commented out either. Neither of them
did which was very disheartening.
Might be a bit easier if you use a regexp, like so (note also that dots in
the LHS of the expression are ESCAPED). This could be optimized further by
grouping .com/.net/.org together, but it is so not worth my time to do that
on a munged string:
-e
's/(munge1\.net|munge2\.org|munge3\.net|munge4\.com|munge5\.net|munge6\.net|munge7\.net|munge8\.com)/reportingdomain.com/gI'
\
-e
's/(mungeuser1|mungeduser2|mungeduser3|mungeduser4|mungeduser5|mungeduser6)/mungeduserid/gI'
\
I really need to get better with regexs. I just need to devise a way to
practice with them I guess. Yes, this would definitely be better. Thanks
# Hopefully this will prevent mail loops.
* $ ! ^X-Spam-Loop: $BOUNCER
Hopefully, if you're going to bother with all the munging, you'd consider
checking for this BEFOREHAND, so that you don't do all that extra work if
the message is eventually going to be ignored.
I don't know why I did it that way. It's probably due to the way the
script evolved from a basic munge and forward script to the mess it is
today. I'll fix that. Thanks again
| sed -e "s/munge1.net/reportingdomain.com/gI" \
[snip, but even more expressions than before]
You're munging again?
Yes. The first was to munge just the Subject. There is a string in the
subject (***SPAM***) that I need removed before I submit the spam to Pyzor
and Razor. Now that I think about it I really shouldn't be removing
anything from the subject that I or my milter didn't add so I should
remove those other regexs and do them later. Both the subject and body
needs munging in the end though. And the munges subject needs to be used
when the spam is forwarded. I can somewhat follow my logic from what I
did many months ago. I need to put some more thought into it.
| $FORMAIL -I ReSent-Date: \
[snip]
If you're going to remove these headers, consider removing them BEFORE you
do all the regexp changes, since this will reduce the volume of material
which the other expressions must reprocess over and over.
That's a good idea as well. I should have done that from the beginning.
I monitored my system load today during an incoming spam run.
Spamassassin chewed up the most CPU time of all the processes involved in
the script. Calling spamassassin -d to remove the SA markup was
surprisingly expensive. I'm going to try removing the markup with sed
when I get a chance. Sed seemed to be much less expensive overall. Also,
reporting spam via spamassassin -r was also very expensive. I updated
Razor and applied a patch to it from the SA folks. It seems to be less
CPU intensive now. That spam run was dieing down about that time though.
Thanks for the ideas. I'll see if I can find any common threads among the
bounces that weren't picked up by FROM_DAEMON and FROM_MAILER. Hopefully
I can find a way to eliminate them. Thanks again
Justin
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail