Re: overall paradigm shift in email, plus rambling philosophical discussion2004-06-21 13:04:18On Jun 21, 2004, at 7:33 AM, Meng Weng Wong wrote: Machine learning algorithms need to key off the combination of authentication-result and sender-domain. Doing one without the other is like eating a sandwich without any bread. That is, of course, true. And I believe that the Bayesian learning is SA is not capable of doing that directly (but Apple's semantic network filtering in its MUA may be able to), I think that even single token Bayesian systems will do what is needed indirectly. It just takes time for them to learn. If a site rejects early on a fail, those messages will never reach the statistical learning system. This means that all mail FROM, say, amazon.com, that the learning system encounters will be non-spam. Likewise all the mail FROM an SPF using spammer (eg, bounce3.rm04.net) will be seen as spam. The learning system, operating behind SPF filtering, will learn about amazon.com and rm04.net. But the learning system will not be making any direct use of the Received-SPF header. I am an old Unix geek, and so follow the view that each tool should do its job well and not try to do other jobs. If SPF does its job well, then the statistical systems will be able to do their job better because the return-paths that they see will be more reliable. So we can keep the relatively simple and computationally practical statistical systems as they are. No tweaking is need. They will work even better if we use SPF exactly for what it was designed for. http://spf.pobox.com/slides/unified%20spf/0335.html Nice slide! I agree fully. SPF makes it easier for statistical systems to learn about good senders and bad senders. My point is that statistical filtering systems do not need to be adapted for SPF. And more trouble comes from humans trying to anticipate or do better than the autolearning systems. Where SPF helps for human intervention is with manual whitelists and blacklists. Now I can manually whitelist amazon.com and blacklist rm04.net if I wish. As an analogy consider the "meaning" of a "none" to a spam filter. At the moment a "none" probably doesn't give any weight either way to spam or hom. But if more opt-out spammers are early adopters of SPF, then for some period "none" will be a mild indicator of non-spammishness. But once publication of SPF records becomes more widespread then a "none" will probably start correlating weakly with spam. But as with the good senders and bad senders, the statistical learning systems will do the right thing without having to be programmed to know about SPF semantics. -j -- Jeffrey Goldberg http://www.goldmark.org/jeff/
|
|