[Asrg] Why content filters are inacceptable

One of the principles to stop spam discussed so far are
content filters, i. e. filtes which basically ignore the 
SMTP connection and the header, and focus on the body of the 
message (and maybe the Subject header line). 

I'd like to point out why such systems are inacceptable and
a severe threat to privacy and freedom of speech.

To do so, let me distinguish between two fundamental types of 
such filters: Those which need to reveal any kind of information about 
the message to be analyzed (e.g. the full message or just a 
hash value) to a third party, and those which don't (but download 
some kind of pattern set from a third party). 


The first kind (those revealing some information):

   The basic principle is to ask a third party "is this message 
   known to be spam?". As discussed, this question will not include
   the full message, but only some kind of hash value. If that third
   party has seen the full message before, it immediately knows that
   the party sending the request has received that particular message. 
   Even if the third party doesn't know the message text, it can
   easily see which persons do receive the same e-mail, and therefore
   can build a database of relations between persons, exactly as secret
   services use to do to undermine privacy. That's the way it works
   even in the most democratic countries (USA: such databases are
   known to exist for decades, and meanwhile there's a "total
   information awareness" program. Germany: Could be used for the
   "Raster- or Zielfahndung".)

   If posession of a mail could in any way compromise you, you can't
   deny to have received the message since your request with that
   specific hash sum was recorded. 

   So once a government observes any kind of a politically unwanted
   message, it can immediately list all those people who have
   received that particular message simply by doing a database lookup. 

   I do refuse to sacrifice all my privacy and to feed Big Brother.

   Ironically, as I pointed out in an earlier message, this kind of
   system fails to block spam, because once such systems become 
   widely spread, spammers will individualize their messages to 
   defeat the hash mechanism, and many of today's spam messages
   already are individualized.




The second kind (not revealing any information):

   In contrast to the first kind, these systems need to 
   "learn" what's an undesired message. If you do not want them to 
   train yourself - and that's unreasonable for most users - they 
   need to download some kind of "knowledge" from an external third
   party, usually a set of patterns. 

   Thus, a third party is controlling which messages make the way into
   my mailbox, and which don't. After all, that's the idea. 

   Except for the intention, there is absolutely no difference 
   - and especially not a technical difference - between this and
   a censorship machinery. 

   Any kind of organisation - whether government, your own secret
   service, a hostile secret service, Scientology, large monopolistic
   companies, intellectual property/patent warriors - could spoof spam
   alerts or even fake spam in order to perform a denial of service
   attack by poisoning the pattern set and thus getting all messages
   matching those patterns filtered. This is not phantasy. A similar
   effect is well known from newsgroups, where unwanted messages were 
   removed by faked cancel messages.

   Since those patterns will not just match a particular message,
   but a wide range of messages - e.g. "money transfer from africa", 
   "penis enlargement devices" - blocking virtually every message
   about a certain subject can easily be achieved. 

   This is the perfect invitation for abuse. Since 9/11, there are 
   severe efforts to enforce political correctness and to impact the 
   freedom of speech. Even in Germany, a few politicians try to 
   block websites (Bezirksregierung Düsseldorf). Their credo is: 
   "If blocking is possible, then we'll do it." Building up such a 
   pattern distribution infrastructures provides the ultimate and
   perfect tool to perform the same filtering on e-mails. If they do
   it with web pages, they won't hesitate to do the very same with
   e-mail. Once there is a third party able to control which mails 
   are passed through, they will order them to block unwanted messages 
   exactly the same way as they currently order providers to block web
   sites. 

   I do refuse to open the door for Big Brother and to sacrifice my
   freedom of speech.

   And by the way, not every message from a girl asking me to have 
   wild sex with her is spam. I don't accept that any third party
   could be able to block all my e-mails matching the "wild sex"
   pattern. Especially not if Big Brother's government is deprecating 
   wild sex for religious reasons. 


   Ironically again, this kind of system is also unable to block spam.
   Why? Have a look at a mechanism of exactly the same kind we already
   use for years: Virus/Worm filters. These filters always fail to
   block the first distribution, because of the recognition and
   pattern update delay. These pattern based content filters block the
   redistribution only. In contrast to viruses and worms, spam doesn't
   have a redistribution step. Spam is distributed only once. So how
   could such a pattern filter, which requires adaption for getting
   effective, ever be effective against a one-time-shot attack like
   spam?

   Adaptive pattern based content filters are effective against 
   contents bound to a longer time interval or redistribution, such as 
   viruses or free speech. They are not effective against contents
   distributed through one-time-shots, such as spam.


A problem applying to both kinds:

   Even if we are far away from having a widely deployed 
   public key infrastructure, we should take care to not slam that
   door closed. A content based spam filter would defeat the success
   of a common PKI. Why? 

   Imagine there were content based spam filters working effectively. 
   Imagine there would also be a widely spread PKI and it would 
   be common to make people's public key publicly available. 

   What would the next generation spammers do? Obviously, they would 
   collect e-mail addresses _and_ public keys, and encrypt their spam,
   thus rendering the content filters ineffective. People would tend
   to not distribute their public key or to not having one at all. 
   Thus, content filters make it even more difficult to establish a
   common PKI.  



I am very convinced about the fact, that any content filtering system 
does not effectively block spam, but is a severe threat to fundamental 
human and constitutional rights, such as privacy and freedom of
speech. While a government would never be able to establish such a 
surveillance and censorship infrastructure for political reasons,
spam-fighters are ready to establish it. (And some of them even claim
to defend the constitution.)

If even democratic countries do abuse any method to block unwanted
contents, what would those contries do which's governments not even
claim to be democratic and to care about human rights?

Imagine what a Senator McCarthy could have done with such a tool. 
Imagine how such a tool would fit in a patriot act III. 
Imagine what a Saddam could do with such a tool. 

I am already scared if I imagine what the Bezirksregierung Düsseldorf
could do with such a tool.



Hadmut


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg