[Asrg] alok - penny black variation

Hi,

I have been thinking about designing something like penny black for a long
time, but I never really had the time to do anything serious. Anyways, I'm
now motivated to work on my idea, and I would like to have your comments
on this matter. (I really don't want ms designing this kind
of stuff, they have done enough harm to the world).

Before I begin, I would like to thank Marty Lamb (www.martiansoftware.com)
for all the emails we exchanged together. He has developed a very
interesting anti-spam method (called tarproxy), which inspired me a lot.
(You should checkout his website if you have never heard of tarproxy).

For several reasons, I think it would be good to define a new email
protocol (this way we can fix a lot of things that weren't quite right
with email, but that's a whole different discussion). The protocol would
work like this: sender connects directly to receiver's server, does a
processor intensive calculation, if sucessful the server will accept the
email. All this is similar to penny black, and presents one major problem:
what about mobile devices, pda's, mailing-lists, etc...

So here is my solution: when the sender connects to the receiver's server,
the server will use a baysien filter to give a score to the message. This
score can be given in a 'dynamic way', that means it can change as the
message is being received (you split the message and run each part through
the filter). Depending on the score the message (or part of message) gets,
the computation will be 'easy' (short) or 'difficult' (long to perform).

So here is an example of what a communication would look like:

Sender: connects to receiver's server.
        starts sending "hi, how are you..."
Server: ok, this looks like 'legitimate' mail
Sender: "i would like to propose you this new
        viagra..."
Server: stop, that's spam. Server sends the sender a large number

One of the questions I need to be answered is: will we need one filter per
 server, or one filter per user ? If it is one filter per server, then we
must take into account that a server might have users speaking many
different languages (something which is also true in the one filter per
user case), and the users can be 'interested' in very different topics
(their day to day emails can be of completely different topics).

I think this method presents one really good point: the filter never
creates false positive, since you will always receive the mail.

We could also combine all this with whitelists (since most servers already
allow users to have their address book online). But whitelists presents
the problem of identifying the sender.

Alok Menghrajani
alok(_dot_)menghrajani(_at_)epfl(_dot_)ch
alokm(_at_)cmu(_dot_)edu


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg