Re: [ietf-smtp] A Zero Spam Mail System

On Sun, Sep 29, 2019 at 4:59 PM Brandon Long <blong(_at_)google(_dot_)com> 
wrote:

If you'd like to know how Gmail works, the first step is Brad's original
paper on that:

https://ai.google/research/pubs/pub45


Yes, that article was excellent, basically proving everything I had been
advocating on this list and others, and also confirming what I was seeing,
on a much smaller scale with box67.com.  I wish the article had come out
two years earlier.  That might have saved me a lot of time.

The system has evolved and become more realtime, larger, maintains

reputations on many more features, algorithms tuned, and various ML added
to the mix for some edge cases, but the system would still be recognizable
to him today.

As for size, I'm sure that there are a bunch of benefits of scale, but I'd
like to think we had better antispam even when we were much smaller than
the competition.


It looked that way to me - one reason I dropped my experiment and switched
to gmail.  (The other being that the post-doc who was working with me on
plans to process the entire mailflow for U of A (74,000 recipients) got a
job offer he couldn't refuse.)

As for size, I'm surprised that nobody gave examples of small services that
are doing well.  I'm still convinced that size is not necessary to have an
excellent whitelisting system.  box67.com was just a few friends and
family, and we were whitelisting 90% of our incoming non-spam, sending the
rest through SpamAssassin.  That's a 10X improvement over SpamAssassin
alone.

If anything, we're now such a target that entire so operations focus

explicitly on matching us.  If spam seems like a solved problem, that's
only due to dedicated work.  I guess the flip side of the benefits of scale
is that the antispam folks needed scales very sublinearly with the amount
of users covered.


Yes, once you have the system adequately automated, the labor required to
maintain Registry records is dramatically less.

Comparing box67 to gmail (as of 13 years ago):

1) box67.com used the HELO name to ID the sender, not the Mail From name.
The HELO name should identify the Transmitter (the ADMD just before the
Receiver (the first ADMD in the Recipient's Network)).
http://en.citizendium.org/wiki/Email_system
2) Default authentication records were based on SPF, A, and MX records.  We
didn't include PTR, CSV, or DomainKeys.
3) Reputation feedback was based on reports from Receivers, not button
clicks from Recipients.  We didn't have our own complete email service,
including webmail, which would have allowed us to use buttons reporting
spam and non-spam.
4) We had a webtool <http://open-mail.org/webtool.html> that Transmitter
domains could use to easily provide correct authentication information to
the Registry <http://open-mail.org/RegistryRecords>, and override any
errors in our default records.  A service as large as Gmail would not need
this, because they can more easily put the burden of proper authentication
on the sender.
5) Our goal was an open-source system that would benefit not just one big
service provider, but any organization that could be an honest and
competent participant.
6) Our statistics on Domain Ratings <http://open-mail.org/DomainRatings> were
nowhere near as complete and accurate as Google's (Taylor's Fig.1 and 2),
but they did generally agree with Google.  There is wide gap between
"spammy" and "non-spammy" domains, with very few in the middle.

Our stats also proved that we could build a very effective whitelisting
system based on reputation of the domain in the HELO command.  Most
legitimate domains send no spam from their authorized Transmitters.  Would
the rest of them eventually get on our whitelist?  The requirements are not
difficult.  As Bradley Taylor said "just authenticate and behave yourself".

_______________________________________________
ietf-smtp mailing list
ietf-smtp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf-smtp

Re: [ietf-smtp] A Zero Spam Mail System - What works?