procmail
[Top] [All Lists]

RE: Filtering bounces for auto-wording recipes

2003-09-26 17:52:58
On Sat, 27 Sep 2003, Dallman Ross wrote:

Bart (I think it was Bart) gave one of my answers.  I actually
only follow the SpamAssassin-and-related stuff only peripherally,
since I find my own spam traps work better and less likely to
false-poz on me.  (And it's so much speedier doing all-procmail
stuff rather than running perl, even if it is a daemon).
So that's all by way of explanation in case this is wrong, but:
aren't the razor lists taking readings on headers and using them
to feed the database?  So isn't postmaster stuff going to screw
with that?

I believe (although I can't say with any certainty) that Razor only uses 
the body.  I can't say for sure though.  I personally have beliefs similar 
to LuKreme.  The likelyhood that anyone would make a typo and end up in 
one of my spamtaps in next to nill.  They could fat-finger postmaster or 
abuse but then again these domains don't generate any reason for 
postmaster or abuse to be contacted.  The owner address is on a 
non-spamtrap domain.  The only things these addresses have been used for 
is to seed spammer's lists.  No public use of them has been made.  ie the 
likelyhood of legit mail ending up in my spamtrap is closer to one in one 
million by my guesstimation.  It's good enough for the Razor folks who 
describe what they call a troll account in their FAQ.  That's essentially 
what this is.

I'd be interested to hear more about your setup.  If I was doing this for 
a provider like I used to do I'd use the list internally.  I'm using the 
spam now to help everyone else by way of Razor and Pyzor instead.  I'm 
going to add the DCC when I get time.

I'm also going to remove as much of the spamassassin fluff from my 
procmailrc as I can as well.  I can strip the SA markup with sed or 
formail and not use perl.  I also found out that 'spamassassin -r' not 
only reports spam to razor, pyzor, and dcc (if they are installed and 
configured for use in one of the SA config files) it also stuffs the spam 
into that user's Bayes DB.  That I would imagine accounts for alot of my 
CPU time.  I've since removed that call and just run razor-report instead.  
it turns out that SA will honor the prefs in a config file for --report.  
Justin Mason suggested creating a config that enabled razor, pyzor, and 
dcc and disabled bayes.  That would be another usable solution.  It still 
have to use some perl no matter what though since Razor is coded in perl.

Yes; but I route 22 domains' mail through it.  And I get lots of
weird personal mail from different places, to lots of different
addresses -- much more so than the average email user.  So my recipes
have to be carefully thought through to avoid FPs.

Interesting.  Sounds complex.  I have 28 domains total, 22 of which are 
used for my spamtraps.  Those 22 domains were clean (fresh out of the 
factory and never used before I got them) which makes using them as 
spamtraps easier.  I haven't seeded all of them yet.  Just a few.  They 
all end up on one box though and my virtusertable makes anything resolve 
which is handy.

Yes, except for misaddressed mail and postmaster bounces, I suppose
that is true.  If you stated the first time that limitation to your
method, okay; but I read too fast and didn't see that.

Sorry.  I have a habit of not providing enough details.  I could use SA's 
score to determine spamishness of the spamtrap mail.  Anything over 3-5 
could be called spam I imagine.  One thing, if I used a pre-defined set of 
spamtraps then I could really say all mail addressed to them was spam.  
Since I use a catchall instead (and define my actual users explicitly) 
then typos could in fact end up getting reported.  I have a list of 
525,000 names that I used for userids @ my spamtrap domains.  I decided 
that an alias file with that many entries was just too big.  That's when I 
elected to go with the catchall.  ...  If I bound an second IP to this box 
and bound an instance of sendmail to it exclusively for spamtraps then I 
think I'd be ok.  That would move what little personal use I have on that 
box to a different IP and the catchall wouldn't hurt anything then.  I may 
do that after I finish my move.

I was speaking generally: that you need to look harder at mail that
you're less sure was spam, than at mail that you're very sure was
spam.  So segregate the piles, and look harder at the stuff that
*should* be looked harder at.  That's all I meant.

I understand and it makes sense.  I'll probably alter my script to isolate 
mail scoring under 3-5 and check it by hand for a month or so.  That would 
help determine if I get many FPs.  It's a good point.

Okay, forget for a moment what I said about style.  What about
the problem of causing bugs in your code?  Did you know that there
is a bug in the current procmail, such that an explicated H in
the top line won't let you turn off H at all in later recipes?

http://mailman.rwth-aachen.de/pipermail/procmail/2002-February/008355.html

No, I didn't know that.  That's good to know.  Is there going to be a 
patch to fix this in the future?  I'll go back and use the # commenting on 
those instances to remove the H.  I also read a while back that Procmail 
has a bug that occasionally causes 'F' to be dropped from 'From' when 
writing to disk.  I've seen it happen before but never knew what to 
contribute it to.  Does that problem exist with 3.22 and would a patch be 
planned for that?  Thanks for the heads up

I was speaking generally, and not even referring to your code
specifically.  But if I gave you a guilty conscience, that's okay.  :)

:)  Oddball path problems crop up on me every so often.  Usually I either 
explicitly define the full path if I'm worried or I'll create a variable 
with the full path in it, like

PYZOR=/usr/local/bin/pyzor

I do that from time to time.  As long as I can keep the oddball path 
problems but biting me when I least expect it everything is kosher. :-)

Thanks for all the input.  I've got some thing I'll have to implement as 
soon as my move is over.  I sure do hate moving.  You tend to lose minor 
things like keyboards and monitors in moves...

Justin


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail