"Jon Kyme" <jrk(_at_)merseymail(_dot_)com> writes:
The former would be useful, but I'm doubtful that it would have much
of an impact on spam. The latter seems to me to rely on the sender
accurately tagging their messages according to content---possibly
would happen often enough that it would be worthwhile, but I'm not
sure that it would.
I'm not sure about this, there seems to me (at the most general) to
be only one class of things that need be asserted in a consent
expression: How this message is classified by some engine. Your
second class seems to me to be the sort of thing that's routinely
handled by content-filters (imperfectly, I grant you).
So rather than saying:
1. message has html => noconsent
2. message mentions 'septic tank enhancement' => consent
3. message is from grandma => consent
4. message has valid consent token => consent
5. message has blacklisted source IP => noconsent
etc ...
You might say something more like
positive_test(name_of_engine_1, engineargs, message) => noconsent
positive_test(name_of_engine_2, engineargs, message) => consent
etc...
I guess someone could standardise this (using whatever language they
wanted), and there are some kinds of content filter (probably quite
simple things---the sort of thing that SIEVE can do, say) that we
could standardise on. That might be useful.
Here's a little idea I had which was inspired by the above.
I'd like to apologise for pre-empting the forthcoming consent framework
document (was it Yakov who was working on that?) but I wanted to write all of
this down before I forget. Feel free to tear it to shreds, although
constructive suggestions for improvement would also be nice. :-)
One approach that seems to work well for packet filtering is the iptables
format of rules used by the Linux Netfilter module (http://www.netfilter.org).
Perhaps a similar structure could be applied here to each e-mail message?
You could have some kind of list of rules against which an e-mail message is
compared in sequence until it matches a rule which specifies some policy
decision.
The Netfilter architecture allows each rule to have an associated external
module to evaluate a match (e.g. the "mac" module to match packets based on the
interface's MAC address is specified using "--match mac") so that it is fully
extensible.
For each rule there is also either a destination decision which specifies the
fate of any packet matching that rule or else the name of another table of
rules to be applied in the same way. Netfilter's ability to combine tables of
tables using jumps and RETURNs allows one to construct very powerful
combinations of rules.
Message matching modules could be supplied by a range of different
companies/programmers and the local user (if this is done in their MUA) or else
the site admin (for a MTA) could utilise whichever modules they prefer at their
level. Thus there might be a module to implement DNS blacklisting, one for some
kind of C/R, a module for digital signature checking, another for content-based
filtering and so on.
Typical destination outcomes for an e-mail message might be:
- silently discard the message (analogous to Netfilter's DROP)
- bounce the message back with an error (like Netfilter's REJECT)
- accept the message for delivery (like Netfilter's ACCEPT)
- log part or all of the message for use in spam statistics and abuser tracing
(like Netfilter's LOG, processing need not terminate after doing this)
There might be other policy options too, this isn't intended to be an
exhaustive list. Just as with Netfilter, the table will need some kind of
default policy for messages that don't match any of the rules listed. Users
could choose a fail-open (ACCEPT) or fail-closed (DROP) approach depending on
their preferences.
So using a pseudo-Netfilter syntax, my spam filtering INPUT table might look
something like this:
--source my_mum(_at_)aol(_dot_)com -j ACCEPT
--match content --content-type text/html --contains JavaScript -j DROP
--match content --content-type text/html --contains InvalidHTMLTags -j DROP
--source friend(_at_)somewhere(_dot_)net --match attachment
--type EXE,SCR,PIF,BAT,VBS -j REJECT
--match attachment --file-type EXE,SCR,PIF,BAT,COM -j DROP
--source trusted_colleague(_at_)work(_dot_)com --match attachment
--file-type JPEG,GIF -j ACCEPT
--match content --content-type text/plain -j ACCEPT
... with a default policy of DROP for anything else.
Notice that I'm willing to send a rejection message to one of my friends to
bounce a message back, as it seems only polite to warn them. But for most
senders I would silently discard suspected spam in order to avoid giving away
the fact that my address is valid and thus incurring more.
This is of course merely an example of how I might specify my personal
preferences. I'm not suggesting that anyone else should set theirs this way,
nor would I presume to tell other people what their default should be. I think
it should be entirely at the recipient's discretion as to what they choose to
receive. As somebody who knows more about e-mail than the average user, I'm
prepared to accept the risk of some genuine messages being dropped provided I
can trust the rules and filter modules I'm using. But this is just my own
preference.
Incidentally I know the above syntax is ugly and unfriendly to end-users. It's
just an example, of course. However it can be made much easier by providing
simple forms or graphical tools for the user in order to generate the rules on
their behalf. If this were integrated into the MUA and tied in with their
address book it would be extremely simple to use.
To simplify usability further, each site (organisation or ISP) might provide a
series of default policies, ranging from "high spam protection" to "no spam
protection". Users could choose a level based on how strongly they feel about
the issue.
One suggestion I've not seen so far is rather like the Internet Explorer
classification of web sites into "zones" depending on their level of trust.
Users might classify senders into zones in the MUA address book, or else define
some rules which can map an individual message into a "zone". Then a default
policy is provided for every zone (which advanced users are free to tweak and
define their own "custom" settings). Of course it's hard to classify e-mail
messages into such zones because it's difficult to determine their true origin,
so perhaps that idea would be unworkable. Anyway this is just an aside, not
essential to the idea I'm describing.
One other thing occurred to me based on the Netfilter comparison. Netfilter has
several tables of rules based on what stage of routing the packet has arrived
at. So there is a PREROUTING table for filtering packets before they've
undergone any NAT translation/mangling, then either the INPUT or FORWARD table
(depending on the routing decision) gets a chance to process them again after
translation.
Under an analogous e-mail filtering system, one could define tables of rules as
follows:
- ARRIVAL
- INPUT
- FORWARD
ARRIVAL - for rules to process a message at the time at which the SMTP client
connection is open, before the message has been accepted and enqueued. Thus at
some suitable point before issuing a "250 Mail queued for delivery" we check
the message against the rules and decide what to do with it. The ARRIVAL rule
table will probably only apply to MTAs.
A decision to ACCEPT would be like a "250 Mail queued" kind of message.
A decision to DROP might map onto a "250 Mail queued" reply code but where the
message is then dropped without placing it into the mail queue.
A decision to REJECT might map onto a "5xy" permanent refusal reply code.
... and so on.
Other types of policy might provide for a DELAY, perhaps a transient "4xy"
refusal code. The delay policy might be used to limit the rate at which
suspected spam spreads around the net, by forcing it to remain in the previous
MTA's queue. Of course this would likely require some stateful tracking of
messages so it might be difficult in practice. I'll leave that one for the
experts to decide.
There might be other kinds of policy too... I don't want to be too prescriptive
at this stage.
Rules in the ARRIVAL chain might also add headers to the messages. For example,
the addition of "X-SpamScore" headers by spam filter modules might be done
here.
INPUT - for a message which has been accepted by the ARRIVAL table and which is
destinated for local delivery. The ISP's MTA might have a special set of rules
which it only applies to mailboxes on its own servers; these rules could be
entered here.
Also, a user's MUA could have its own additional INPUT table for messages which
the ISP has not blocked. A user might thus implement their own spam policy on
their local machine in the event that their ISP's generic policy isn't good
enough for them. My personal example above would fit in here, filtering
messages as they are downloaded from the server (via POP3, IMAP or whatever
protocol they like. I think IMAP has some interesting possibilities in its own
right but that's an aside).
FORWARD - for a message which has been accepted by the ARRIVAL table of an MTA
but which is to be passed on elsewhere to some server which is outside the
control of the organisation running this MTA.
Rules included here might be based on reciprocal spam-blocking agreements
between the organisations which operate MTAs (ISPs, companies, whoever). There
might be scope for some kind of social co-operation here by favouring messages
to/from well-behaved ISPs. Protocols for sharing/co-ordinating policy might be
used to keep FORWARD rules up-to-date.
As a simple example of an extreme, a honeypot server which traps spam but never
delivers might have a FORWARD table consisting of a single LOG rule to keep a
copy of the message for later analysis and a default DROP policy so nothing
gets externally delivered.
Of course I've not said anything about how the modules would communicate with
such a system, nor about how (if at all) individual modules might communicate
with each other. Suggestions or comments about that aspect of the system would
be helpful to flesh out some detail.
It's not a solution to spam, though, because some things really are
things that can't be checked automatically, so the content filtering
will be imperfect. And (if it were to be standardised) we can expect
it to become more and more imperfect.
Maybe filtering will be good enough for long enough to allow time to deploy a
better long-term solution. At any rate it can give everyone some breathing
space.
Sorry this turned into such a huge message, I probably got carried away. Things
I like about this idea are:
- it could express consent policy at different stages in the network, as
discussed previously on the list. There is scope for some mechanism (protocol?)
to synchronise or distribute policies across the internet. I will leave the
possible design of this to other people.
- it is independent of the SMTP transport protocol and therefore does not
require changes to the SMTP standard (of course, policy decisions will need to
be mapped onto SMTP response codes somehow).
- it doesn't need to be deployed by the whole world at once in order to reap
benefits, just as Gordon has pointed out about his personal consent system.
I've tried to show how some of his suggested categories of things might be
matched and blocked in my INPUT example above
Things that worry me about it:
- it assumes that there are suitable filtering techniques (modules) available
to fit into this system
- it needs the writers of MUAs to get on-board and for users to upgrade to
newer, more capable MUAs. I suspect that the users who hate spam enough would
be quite willing to download better software if their ISP held them by the
hand. Those who don't upgrade will still be able to communicate with the rest
of the world, subject to the spam detection policies of their recipients
- it would work best if MTAs were also redesigned to employ the scheme. Of
course the interoperability means it can be deployed on a small scale first and
gradually increased
Thanks for reading...
Andrew
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg