[Top] [All Lists]

Re: [ietf-smtp] IETF Policy on dogfood consumption or avoidance - SMTP version

2019-12-23 13:50:09
On 12/23/19 1:51 PM, Russ Allbery wrote:

Keith Moore <moore(_at_)network-heretics(_dot_)com> writes:

Yeah, I see so much evidence of poorly-chosen spam filtering criteria
that I'm not willing to give the spam filters a pass just because the
volume of spam is so great.
I think we need to recognize that spam filters are part of the problem.
Just because a problem exists does not mean that any supposed fix is
I completely agree with this, and I also have seen a lot of bogus spam
filtering criteria.  But I think the question is harder: Is there any fix
that is good?  Or are there only least-bad fixes?

See my other message to Hector about different categories of spam filtering.

Clearly there are different degrees of "good" or "bad".   Part of what we might be able to do is define some criteria for goodness, for measurement of goodness, and for how often to re-evaluate goodness.    (For example whenever someone quotes an FP rate without telling me how they measured it, I take it with a grain of salt.)

The basic problem is stark and difficult: 90-95% of email is spam,
malware, phishing, or other types of unwanted junk, and this mail is being
sent by intelligent (sometimes) and adaptive (occasionally) adversaries.
Meanwhile, in my experience the tolerance of the average person for junk
delivery into their inbox is *well* below 50%.  I haven't measured this,
so take with substantial salt, but anecdotally I would say that the
usability of email for a lot of people drops significantly if more than
10-20% of their inbox is junk, and many people want it lower than that.

Even harder, letting through even one or two malware or phishing messages
can be very dangerous, regardless of how good the percentages look.
Gmail in my experience does way better than that, getting down to about
1%, at the cost of some (but not that many) false positives.  For my
personal email, I use a Bayesian filter tuned solely for me and almost no
other filtering criteria and get more like 5% spam in my inbox, at a cost
of more false positives than Gmail.

It's hard to meaningfully compare filter effectiveness at different domains, because different domains (and different users) get different amounts and kinds of spam.

My business email account has no spam filter.   It gets a decent amount of spam, but I also don't get a huge volume of mail at that account so manually filtering out the spam is still fairly easy. OTOH I have other accounts that have fairly aggressive spam filters, but 99% of the messages that make it through are still spam.

I also find that the user interface makes a huge difference in how tolerable spam is.   If I'm using a webmail interface that requires several clicks and page loads and other delays to deal with spam, the spam is more annoying than with my normal UA where I can fairly easily select a set of 40-50 messages, unselect the 2-3 messages that appear legitimate, and hit delete.

The most effective standardized spam filtering techniques to date that
have retained effectiveness over time and not been fairly trivially
bypassed by spammers have been authentication approaches (SPF, DKIM,
etc.), which only handle certain classes of junk (but are particularly
helpful against phishing, which for most people is the most dangerous form
of junk).  Those now seem to be clearly good ideas, but by themselves seem
unlikely to achieve the necessary outcome.

Another thing I'm wondering is whether it would help for IETF to recommend that MSAs that forward mail to arbitrary SMTP servers on the public Internet, sign their outgoing traffic (say with DKIM) and perhaps to define a profile for doing so.

More broadly speaking, can we effectively raise the bar for "legitimate" email in a well-defined way so that everybody knows what hoops they need to jump through rather than having to guess?   And can we raise the bar in such a way that favors legitimate mail over spam?

For instance, now that we have a spec for relaying using SMTP over TLS, if we specified that relay of mail from the MSA to the MX of the destination domain SHOULD use TLS with client certificates, then servers would have a more reliable and quicker way to identify and classify senders.

What else is proven to a level that the IETF can standardize and recommend
it as a replacement to the ad hoc techniques that operators are using to
try to keep the ship from sinking?

I don't know that we need to restrict ourselves to "proven" techniques.   I do think that some techniques have been around long enough that it might make sense to evaluate their effectiveness, but I don't think new ideas should be out-of-scope.


p.s. also I don't think this is likely to be "exciting" work, or work that we should crow about.  If successful, it's more likely to promote gradual improvement over time than any earth-shattering immediate change.  I think it's work that needs to be approached with very sober, level-headed judgment and analysis.

ietf-smtp mailing list

<Prev in Thread] Current Thread [Next in Thread>