Re: How to use SPF to reject spam
2005-04-06 07:46:09
dejanspf(_at_)ztbclan(_dot_)com wrote:
Gentlemen,
I apologize for the long post in advance. I wanted to maintain
context, as opposed to spreading my position over a month long
thread like we did with the DNS loading thread.
nice job. I just implementing MTA and based on your work may improve
my algorithm, just on time post :)
Tnx
You're welcome.
You will need to pay attention to some subtle practicalities when you
implement your system. Here are a few I can think of:
1. The loop I proposed behaves like a PID controller
(Proportional-Integral-Derivative controller), if you are familiar with
control theory. It's not as complicated as it sounds, but not as simple
as it appears, either.
See the sci.engr.* FAQ below (read the "Special problems in realization
and implementation" section first, and then the rest) :
http://www.tcnj.edu/~rgraham/PID-tuning.html
In the FAQ they describe the "Integrator Windup" phenomenon. This is one
of the things that can make the entire system unstable, and needs to
carefully avoided. The FAQ recommends a "Tracking Anti-Windup". I like
the term "leaky integrator" but the idea is the same. Here's how it
relates to our application:
The stats database must have a "leak" method by design. For instance, if
a domain achieves a bad reputation, then cleans up their act, the stats
database should reflect it, by steadily and slowly decreasing the
domain's stats towards 'neutral' or 'not enough info available'. This is
because some small domain may have made a mistake in their SPF record
that erroneously authenticates a spammer because it's more permissive
than should be. So, the spammer can safely forge the domain name to send
the spam, which will get through initially because of the (previous)
good reputation of the small domain. After the domain owner fixes the
SPF record, his reputation must be allowed to return to 'unknown' or
'good' over the course of some time (more analysis is needed, but I
think 6 months-1 year would not be inappropriate).
In your implementation this may be as simple as multiplying both the
ham_count and the spam_count for *all* domains in the database by
leak_factor during your daily maintentance.
If you want to have the stats leak 99% of the stats in 1 year, x is
calculated as follows:
leak_factor = e^(ln(0.99)/365)
where e is the nnumber e (2.71828...)
ln is of course the natural logarithm.
In this case, leak_factor is 0.9999465...
This means that your column counts must be floating point numbers (or
else, a ham/spam ratio of 100/1 will not be leaked, as
100/1 * leak_factor = 99.99725/0.999972, which, if you store integers
will be rounded to 100/1, or decimated to 99/0, so you'd either not leak
at all, or leak much faster than intended.
This brings up the issue of 0. When you have 0 in the table, report them
as such, and the division should not be done.
There are other subtleties that must be considered when actually
implementing what looks like a simple algorithm. Not considering them
means you will need to debug and tweak it later.
Regards,
Radu.
|
|