spf-discuss
[Top] [All Lists]

Re: How to use SPF to reject spam

2005-04-06 07:46:09
dejanspf(_at_)ztbclan(_dot_)com wrote:
Gentlemen,

I apologize for the long post in advance. I wanted to maintain
context, as opposed to spreading my position over a month long
thread like we did with the DNS loading thread.


nice job. I just implementing MTA and based on your work may improve
my algorithm, just on time post :)

Tnx

You're welcome.

You will need to pay attention to some subtle practicalities when you implement your system. Here are a few I can think of:

1. The loop I proposed behaves like a PID controller (Proportional-Integral-Derivative controller), if you are familiar with control theory. It's not as complicated as it sounds, but not as simple as it appears, either.

See the sci.engr.* FAQ below (read the "Special problems in realization and implementation" section first, and then the rest) :

http://www.tcnj.edu/~rgraham/PID-tuning.html

In the FAQ they describe the "Integrator Windup" phenomenon. This is one of the things that can make the entire system unstable, and needs to carefully avoided. The FAQ recommends a "Tracking Anti-Windup". I like the term "leaky integrator" but the idea is the same. Here's how it relates to our application:

The stats database must have a "leak" method by design. For instance, if a domain achieves a bad reputation, then cleans up their act, the stats database should reflect it, by steadily and slowly decreasing the domain's stats towards 'neutral' or 'not enough info available'. This is because some small domain may have made a mistake in their SPF record that erroneously authenticates a spammer because it's more permissive than should be. So, the spammer can safely forge the domain name to send the spam, which will get through initially because of the (previous) good reputation of the small domain. After the domain owner fixes the SPF record, his reputation must be allowed to return to 'unknown' or 'good' over the course of some time (more analysis is needed, but I think 6 months-1 year would not be inappropriate).

In your implementation this may be as simple as multiplying both the ham_count and the spam_count for *all* domains in the database by leak_factor during your daily maintentance.

If you want to have the stats leak 99% of the stats in 1 year, x is calculated as follows:

leak_factor = e^(ln(0.99)/365)
where e is the nnumber e (2.71828...)
ln is of course the natural logarithm.

In this case, leak_factor is 0.9999465...

This means that your column counts must be floating point numbers (or else, a ham/spam ratio of 100/1 will not be leaked, as 100/1 * leak_factor = 99.99725/0.999972, which, if you store integers will be rounded to 100/1, or decimated to 99/0, so you'd either not leak at all, or leak much faster than intended.

This brings up the issue of 0. When you have 0 in the table, report them as such, and the division should not be done.

There are other subtleties that must be considered when actually implementing what looks like a simple algorithm. Not considering them means you will need to debug and tweak it later.


Regards,
Radu.