Thanks for the link Eric. It does look interesting and I'll take a look
at that code... My greylisting is actually done as sendmail milter and
not directly part of spamassasin. However my reputation system (I've
already finished the initial code as spamsassin plugin) is - they will
interface together by means of database. Spamassasin plugin code will be
made public by end of the month or sooner since I'll need some additional
testers to help with this research.
I do actually have quite a bit larger plans for all this and reputation
database of single ips is just first step in it. One of the things I ran
into is trying to decide what algorithm to use for calculating mean value
in real time. The obvious mean functions are arithmetic and harmonic mean
and there is also geometric mean, but its arithmetical complexity is not
likely to be worth it. The algorithms I thus intend to try are:
MA(n)*n + S(n+1)
MA(n+1)= ---------------- Where MA(x) is arithmetic mean at step x
n+1 with S(x) being actual score at step x
n+1 Where MH(x) is harmonic mean at step x
MH(n+1)= ------------------ with S(x) actual score at step x
n/MH(n) + 1/S(n+1)
I also think that some kind of multiplier should be applied so that latest
data has highier weight then previous one, so this endup being:
MA(n,w)*n + S(n+1)*w Where MA(x,w) is arithmetic mean
MA(n+1,w)= ------=------------- at step x with weight w (w>1)
n + w
n + w Where MH(x,w) is harmonic mean at
MH(n+1,w)= -------------------- step x with weight w (w>1)
n/MH(n,w) + w/S(n+1)
I'm actually not good with probability theories so if other can comment
if I'm on the right track, it'd be good. Also the spam score function
S(x) to me feel like a stochastic process, so something like bayes score
calculations should probably be used. In any case if people here can point
me to good book to read about real-time calculations & operations with
stochastic functions, I'd appreciate it.
---
William Leibzon
mailto: william(_at_)completewhois(_dot_)com
Anti-Spam and Email Security Research Worksite:
http://www.elan.net/~william/emailsecurity/
Whois & DNS Network Investigation Tools:
http://www.completewhois.com
On Thu, 9 Feb 2006, Eric A. Hall wrote:
On 1/29/2006 10:27 PM, William Leibzon wrote:
What I'm thinking for one of the next stages is change this whitelisting
ip database into (IP,score) where score is updated and is medium of the
scores of the emails that came from the system before - i.e. it basicly
is a real-time updated reputation system.
I did some work on something like this a while back by overloading the
SpamAssassin auto-whitelist database--tuples and reputation information is
already stored there, and I pass most incoming mail through SA while the
session is still active, so I get to reuse some of that info for free.
http://www.ehsco.com/misc/sagrey/ is where the SA plugin lives if you want
to look at it.
Right now I'm only using it to add an extra score for mail that appears to
be spam and originated from an unknown tuple, but what I want to do is
defer acceptance based on whether or not the rule fired (essentially
allowing me to restrict greylisting to mail that is likely spam from
unknown tuples). I couldn't do that with Postfix last time I looked
(header checks could not generate a DEFER action) and I haven't had time
to rebuild my whole mail system yet.
As to what you are pursuing, a similar approach would let you leverage the
reputation score associated with the tuple in the AWL, which seems to be
mostly what you are looking for.
--
Eric A. Hall http://www.ehsco.com/
Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg