Re: [Asrg] Taxonomy of anti-spam systems

Initially my goal was to offer a few small suggestions for improving
your taxonomy.  However, the suggestions snowballed, and the result
is a rather drastic set of suggestions.

In short, I think we should be looking for sets of "orthogonal dimensions"
along which to measure anti-spam systems rather than a "taxonomy"
in the traditional sense of that word (a tree of mutually-exclusive,
jointly-exhaustive categories).

Consider the following example: an ISP that uses white- and
blacklists plus content-filtering to detect spam, but simply puts
special headers in the messages, leaving it to specially-configured
MUAs to either (at the user's discretion) quarantine spam or to bounce
back a challenge.  The point here is to illustrate how anti-spam
system are probably best understood as mixing together choices from a
few dimensions rather than as belonging to a single-rooted hierarchy.

Here are the main dimensions of my anti-spam-system classification
scheme: How is spam detected?  What is the response to spam?  Where
does the detection and response take place?  Let's look at each:


1) Where (for both detection and response):
   (a) sending MTA, (b) receiving MTA, (c) receiving MUA

   If detection and response happen in different places, we assume
   some form of message tagging is needed -- a good place to plug in
   the thread about standardizing spam headers.


2) Response:
   (a) Throw in bit-bucket
   (b) Quarantine (giving recipient ability to review)
   (c) Rate limit (obviously not appropriate at MUAs)
   (d) Issue challenge/collect response
   (e) Demand real-world payment/punishment

(e) Deserves a little explanation.  This is meant to capture the idea
of collecting payments under the Bonded-Sender/Templeton-Postage
approaches, or suing for copyright infringement under the Habeas
approach, or suing for violation of ISP user agreements, or suing for
misappropriation of one's identity (forgery).  Obviously, most of
these require some type of hard-to-repudiate authentication of
real-world entities to be effective.

Another point about (e) versus the others.  ALL of a-e are there, at
least in part, as a deterrence -- to make it
painful/expensive/less-useful to send spam.  In this regard, (e)
is a little different from the rest because it probably makes sense
ONLY as a deterrence.  I'm not sure if this point deserves more than
a footnote.


3) Detection

Here again, I'm inclined to create a space of "orthogonal dimensions"
rather than a strict taxonomy:

3.1) Features considered
     (a) Sender
         (i) Untracked sender
         (ii) "Tracked" sender
              ...lots of futher classification for "tracked" senders
     (c) Message content (including headers)
     (d) "Permission to send" tokens (I like to call this "Postage")
         (i) A priori (payment made for this message)
             (I) Real money (lots of subcategories)
             (II) Evidence of money spent (lots of these too, e.g., hashcash)
         (ii) A posteriori (payment promised upon misbehavior)
              ...lots of these, e.g., Habeas, Bonded Sender, Templeton Postage

3.2) Determination mechanism
     (a) Human
         (i) White/black/reputation lists for senders
             ...Such lists can be further classified according
                to how they are maintained
         (ii) Collaborative filtering for bodies
         (iii) Hand-crafted rules for senders, headers and/or bodies
             ...includes such things as "Postfix" and checkings "envelope
             characteristics" as well as "Dear friend"-like body rules
     (b) System
         (i) Learning based on human spam/ham tagging
             ...much sub-classification is possible
         (ii) Bulk detection (DCC)

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg