Re: overall paradigm shift in email, plus rambling philosophical discuss

On Jun 21, 2004, at 7:33 AM, Meng Weng Wong wrote:

Machine learning algorithms need to key off the combination
of authentication-result and sender-domain.  Doing one
without the other is like eating a sandwich without any
bread.

That is, of course, true. And I believe that the Bayesian learning isSA is not capable of doing that directly (but Apple's semantic networkfiltering in its MUA may be able to), I think that even single tokenBayesian systems will do what is needed indirectly. It just takes timefor them to learn.

If a site rejects early on a fail, those messages will never reach thestatistical learning system. This means that all mail FROM, say,amazon.com, that the learning system encounters will be non-spam.Likewise all the mail FROM an SPF using spammer (eg, bounce3.rm04.net)will be seen as spam. The learning system, operating behind SPFfiltering, will learn about amazon.com and rm04.net. But the learningsystem will not be making any direct use of the Received-SPF header.

I am an old Unix geek, and so follow the view that each tool should doits job well and not try to do other jobs. If SPF does its job well,then the statistical systems will be able to do their job betterbecause the return-paths that they see will be more reliable.

So we can keep the relatively simple and computationally practicalstatistical systems as they are. No tweaking is need. They will workeven better if we use SPF exactly for what it was designed for.

http://spf.pobox.com/slides/unified%20spf/0335.html

Nice slide! I agree fully. SPF makes it easier for statistical systemsto learn about good senders and bad senders.

My point is that statistical filtering systems do not need to beadapted for SPF. And more trouble comes from humans trying toanticipate or do better than the autolearning systems.

Where SPF helps for human intervention is with manual whitelists andblacklists. Now I can manually whitelist amazon.com and blacklistrm04.net if I wish.

As an analogy consider the "meaning" of a "none" to a spam filter. Atthe moment a "none" probably doesn't give any weight either way to spamor hom. But if more opt-out spammers are early adopters of SPF, thenfor some period "none" will be a mild indicator of non-spammishness.But once publication of SPF records becomes more widespread then a"none" will probably start correlating weakly with spam.

But as with the good senders and bad senders, the statistical learningsystems will do the right thing without having to be programmed to knowabout SPF semantics.


-j

--
Jeffrey Goldberg                        http://www.goldmark.org/jeff/

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: Length of txt records, Roy Badami

Next by Date:

Re: a grand unified theory of MARID, Meng Weng Wong

Previous by Thread:

Re: overall paradigm shift in email, plus rambling philosophical discussion, Meng Weng Wong

Next by Thread:

Re: overall paradigm shift in email, plus rambling philosophical discussion, Koen Martens

Indexes:

[Date] [Thread] [Top] [All Lists]