procmail
[Top] [All Lists]

Re: Filtering bounces for auto-wording recipes

2003-09-26 08:10:24
On Fri, 26 Sep 2003, Dallman Ross wrote:

Bzzzt.  Wrong answer.  You are contaminating the razor lists.  For
myself, I pride myself on the low false-positive rate of my private
spamtrap recipes.  Nevertheless, a stubborn 1% continue to get in
there.[1]  Before I send off to razor, I "mutt -f" the file, type
"o" and then "f" to sort by From: field, then scan down the names
quickly.  I can do several hundred in about forty seconds.  I
hit the ">" key to page-down.

How exactly am I contaminating the razor list?  Only spam is being
submitted after all (assuming the bounce filters are working).  The
domains I'm using for spamtraps have never had any legitimate usage
before.  There shouldn't ever be any legit mail coming into anything for
the standard role accounts and domain owner.  Everything else is
unsolicited.

I would manually check all that spam but that's a lot of spam.  For 
example yesterday my spamtraps fielded 36,762  pieces of spam.  Verifying 
that much spam takes an awful long time.  

[1] False negatives are almost none.  Long-term rate is well under
one in a thousand.

Sounds like you have a pretty slick system.  This is with you regular 
email account though, right?

At the very least, you ought, then, to be assigning weighted scores
of "spamishness".  I use a five-level system.  Five is the highest,
and gets shunted to my inbox with almost minimal further testing.
The stuff that ends up in my spam folder is mostly in the bottom
two (or below one, which is as sure a bet as I can make).  Sometimes
a three slips in there.  Then I see what I can do to revise my
heuristic!  Anyway, MAILER-DAEMON stuff should never get into a
pile that automatically gets reported.  Even if you (your algorithms)
think it's forged, put it somewhere separate, as Sean says.  I use
a file called "purgatory" for stuff that tickles only one of my
spam snaggers or has a $TRUST (my system) score of four or five.
A few emails a day hit purgatory, and I look at them manually.

How does this apply at all?  If it arrives on a spamtrap that has never 
been used by anyone for any legitimate reason then it's spam.  I can't 
think of any reason to weight the scoring to determine a messages 
spamishness since it's spam regardless (again assuming I can get the 
bounces filtered out reliably).

I'm going to manually parse that spam for the next few days and try to 
archive any bounces that don't fit FROM_DAEMON or FROM_MAILER.  Hopefully 
that will turn out useful.

It's stylistically a bad idea to do that.  Six months, or six years,
from now, when you are or your successor is looking at the code, you
or he will wonder why you stressed that.  Since it's the default,
yet it was placed in the recipe explicitly, the programmer must want
to call something to our attention.  What could that be?  Code
reader now spends the next half-hour trying to figure out what he
doesn't grasp at first glance about that recipe, that would make the
coder wish to call something to our attention with such explicitness.

I don't see this as being a problem personally.  There won't ever be a 
successor.  Just me since it's my own personal systems that this runs on.  
It simply emphasizes the fact that I'm looking at the header only.  I 
believe from a stylistic point of view that not explictly defining it 
could cause more confusion than having it.  

These are also the reasons why I do not place full paths to
called system binaries in my .procmailrc -- that's what the $PATH
variable was compiled in for!  If I ever *do* have an explicit
path stated, it tells me (or the person reading along) that the
path to that binary is not standard, and deserves special attention.

I assume you're referring to the pyzor call.  I'm not sure why I did that.  
That's the only case in any of my procmailrcs of it that I see.  I'll 
probably change that later today.  It's not like it's in a non-standard 
place.

Anyway, back to your coders' notes to yourself: that is what
comment fields are for, my friend!  E.g.,

      :0  # remember, H is default for conditions; hb for actions

I could but it doesn't make much difference to me.  A future revision 
might include this.  Explicitly defining it or saying this every time 
would be one of the two ways I'd go.

The sed stuff that was originally posted raises my uh-oh flag.
If you are replacing $SUBJECT using sed in auto-replies, shell
meta-chars in the Subject: can really screw you up.  Here is
part of a thread I was participating in from three years ago
on this list that recalls some of the issues:

http://www.xray.mpe.mpg.de/mailing-lists/procmail/2000-04/threads.html#00139

I hadn't thought of any problems they could cause.  I'll check your 
thread.  Thanks for the heads up and other good info.

Justin


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail