[Asrg] Nucleus of a draft BCP on filtering

This is something I wrote as a general high level set of suggestions forspam filtering. While it tends to stress current technologies (sourceand content filtering), it does have some applicability to other techniques.


May be useful as a starting point for a BCP on spam filtering.

Guiding General Principles: - spam is UBE (unsolicited bulk email)

- spam is a behaviour, not a content

- a spam filter is not an enforcer of your user's morals, but the
protection of your user's mailboxes from timewasting stuff they don't
want.

- in any but the smallest environment, some spam will get through
and some non-spam will be filtered. User expectation must be managed.
Provision must be made for detecting and dealing with false positives.

- spammers resort to the most extreme (sometimes criminal) measures toget around filtering as well as making the "message" as intrusive aspossible.


- every spam block should be inline rejected with a contact point for
mistakes.

- spam should not be bounced - headers are almost always forged, thus

bounces go to the wrong places, and often harrass innocent thirdparties. Mail should be rejected during the SMTP transaction to providefeed back to legitimate senders.


- you need to come up with a guidance metric for what level of false
positives is acceptable.  Because there _will_ be false positives in
any but the most trivial and/or ineffective of spam filtering environments.

- Environments differ - filtering that may be okay for one may not
be acceptable for another.  Hence, purchased anti-spam services are
sometimes highly sub-optimal.


General techniques:

- spam is a behaviour, not a content. You're trying to identify the
sender behaving that way, not the material per-se.

- Blocking on simplistic body word matches, no matter how attractivethey appear to be do not work effectively in any but the smallestenvironments. Even "fuck" appears in legitimate business email.Filtering techniques should attempt to detect only those things thatordinary users cannot do inadvertently, and spammers cannot bypass.


- sometimes simplistic content rules work if the rule is "unique" or
"unusual" enough.  FPs must be monitored within your false positive goals.

- header patterns (of spamware), and mail sources (IP ranges and rDNS
names) are generally more effective and less FP-prone than simplistic
content filters.

- Massive spammers are evolving towards "borrowing" innocent thirdparties for final delivery to your server. Open SMTP relay, open SOCKSproxy, open HTTP proxy. It is not practical to test for these inreal-time. You need to rely on third party blacklists.


- There are many blacklists.  Ranging from hyper-aggressive to totally
ineffective.  From professionally run to maintained out of spite and
hatred. Blacklists should be evaluated for effectiveness rate, FP rate,
and professionalism of maintenance.

- In some cases your policies will conflict with the blacklists, so you
need to be prepared to whitelist sources.

- You need the ability to whitelist on sender, recipient and source.

- Filtered email should be stored for a reasonable time.

- Filtered email should be available for viewing by the end users to
help detect and rectify false positives.

- Non-existant recipient addresses are valuable - you can use them to
detect spammers with little false positive risk (honeypot).

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg