I think that I am working towards a rather broader critique of the way that
SpamBayes &ct. are applied.
Naïve Bayesian learning schemes are intrinsically vulnerable to
counter-programming. They work on a small scale only because there is not a
sufficient value to counter-programming.
I am reminded of the chess match between Kasparov and Deep Blue. One of the
Professors at MIT who works on computer chess told me that they could have
taught Kasparov how to outwit the machine by exploiting weaknesses in the
computer strategy.
In general any naïve learning approach can be intentionally taught to identify
a certain characteristic as a strong indicator of spam by an attacker. Once the
attacker can control the learning system state there is no end to the tricks
that can be played.
The common theme at the MIT conference is that the way you test an anti-spam
measure is against a static test corpus. What is left unmeasured is the
resistance to counter-programming.
I believe that what opponents of the DKIM approach describe as a vulnerability
of DKIM is in fact an intrinsic weakness of the spam filtering techniques
described and that the DKIM exploit is merely one example of a much wider class
of attacks against those schemes.
This objection is not coming from large scale anti-spam filtering operations,
it is coming from people who run spam assasin on their personal email file and
take a look at the rules their system is building.
-----Original Message-----
From: ietf-dkim-bounces(_at_)mipassoc(_dot_)org
[mailto:ietf-dkim-bounces(_at_)mipassoc(_dot_)org] On Behalf Of J.D. Falk
Sent: Tuesday, August 22, 2006 7:41 PM
To: ietf-dkim(_at_)mipassoc(_dot_)org
Subject: Re: [ietf-dkim] Bayesian filters are the pits
On 2006-08-22 12:56, Hallam-Baker, Phillip wrote:
Third we need to promote the idea that you should not look for the
existence or even the validity of a DKIM header as being as
important
as the domain that is claiming responsibility. If you can't
correlate
the domain to some form of additional information you should ignore
the record entirely.
That's generally true in a simplistic spam / not spam
decision. If you're making a forged / not forged decision,
the record is still useful.
This has nothing to do with naive Bayes, but everything to do
with naive mail administrators looking for simple binary spam
/ not spam criteria.
--
J.D. Falk, Anti-Spam Product Manager
Yahoo! Communications Platform Team
_______________________________________________
NOTE WELL: This list operates according to
http://mipassoc.org/dkim/ietf-list-rules.html
_______________________________________________
NOTE WELL: This list operates according to
http://mipassoc.org/dkim/ietf-list-rules.html