Re: How to use SPF to reject spam

dejanspf(_at_)ztbclan(_dot_)com wrote:

Gentlemen,

I apologize for the long post in advance. I wanted to maintain
context, as opposed to spreading my position over a month long
thread like we did with the DNS loading thread.



nice job. I just implementing MTA and based on your work may improve
my algorithm, just on time post :)

Tnx


You're welcome.

You will need to pay attention to some subtle practicalities when youimplement your system. Here are a few I can think of:

1. The loop I proposed behaves like a PID controller(Proportional-Integral-Derivative controller), if you are familiar withcontrol theory. It's not as complicated as it sounds, but not as simpleas it appears, either.

See the sci.engr.* FAQ below (read the "Special problems in realizationand implementation" section first, and then the rest) :


http://www.tcnj.edu/~rgraham/PID-tuning.html

In the FAQ they describe the "Integrator Windup" phenomenon. This is oneof the things that can make the entire system unstable, and needs tocarefully avoided. The FAQ recommends a "Tracking Anti-Windup". I likethe term "leaky integrator" but the idea is the same. Here's how itrelates to our application:

The stats database must have a "leak" method by design. For instance, ifa domain achieves a bad reputation, then cleans up their act, the statsdatabase should reflect it, by steadily and slowly decreasing thedomain's stats towards 'neutral' or 'not enough info available'. This isbecause some small domain may have made a mistake in their SPF recordthat erroneously authenticates a spammer because it's more permissivethan should be. So, the spammer can safely forge the domain name to sendthe spam, which will get through initially because of the (previous)good reputation of the small domain. After the domain owner fixes theSPF record, his reputation must be allowed to return to 'unknown' or'good' over the course of some time (more analysis is needed, but Ithink 6 months-1 year would not be inappropriate).

In your implementation this may be as simple as multiplying both theham_count and the spam_count for *all* domains in the database byleak_factor during your daily maintentance.

If you want to have the stats leak 99% of the stats in 1 year, x iscalculated as follows:


leak_factor = e^(ln(0.99)/365)
where e is the nnumber e (2.71828...)
ln is of course the natural logarithm.

In this case, leak_factor is 0.9999465...

This means that your column counts must be floating point numbers (orelse, a ham/spam ratio of 100/1 will not be leaked, as100/1 * leak_factor = 99.99725/0.999972, which, if you store integerswill be rounded to 100/1, or decimated to 99/0, so you'd either not leakat all, or leak much faster than intended.

This brings up the issue of 0. When you have 0 in the table, report themas such, and the division should not be done.

There are other subtleties that must be considered when actuallyimplementing what looks like a simple algorithm. Not considering themmeans you will need to debug and tweak it later.



Regards,
Radu.