[ietf-clear] Getting CSV ready for prime time

John Levine <clear(_at_)johnlevine(_dot_)com> wrote:

[John Leslie wrote:]

We SHOULD be very careful to not give the impression that CSV will
prevent abusive forgery of domains in EHLO strings.


Ah, there's the misunderstanding.  I didn't mention EHLO forgery
because that's not my major concern.  Large ISPs have large ranges of
customer IP ranges each of which has a 100% genuine name but shouldn't
be sending mail.


   (Admittedly off-topic: As an ISP, I don't see this as a reasonable
thing to ask CSV to enforce. If such customers send email using the
actual AOL IP address, AOL has full filtering control over it. If they
send email using that EHLO but a different IP address, that's impossible
for the receiving SMTP server to distinguish from forgery.)

AOL, for example, has about a million addresses with names of the form
AC958E67.ipt.aol.com.  Comcast has hundreds of thousands if not millions
with names of the form pcp152920pcs.hamntn01.nj.comcast.net.

Those are the names we need something like a wildcard to catch.


   IMHO, we shouldn't be doing gymnastics for such cases. AOL certainly
has the expertise to make wildcards work for this purpose (I won't go
into details!); and Comcast probably also does, though there's little
evidence they care one way or the other.

   (Please understand, I am _not_ objecting to defining the bit as
John Levine proposed!)

I realize that many of them apply port 25 blocking, but we all know that
port 25 blocking is at best a band-aid.


    (As an ISP, I consider port-25 blocking to be perfectly legitimate
and effective, so long as it is understood to be part of the service
agreement. I don't use it much; but that's for other reasons.)

A less important but also useful ability would be for me to say "any
host that HELOs in abuse.net is lying," which would, at the moment,
catch many millions of spams a day, including about 200,000 that
bounce and blow back to me.  I realize that bad guys will work around
it, but any anti-spam technique merely narrows the set of ways that
they can send spam.


   It's a real pain to analyze blow-backs to forged From addresses to
find whether they also use that domain in the EHLO. My very sparse
checking indicates they generally don't; but in most cases there's
not enough trustworthy information to say one way or the other. :^(

   In any case, I wish to design for the long haul; and I believe
spammers will pretty quickly adapt to avoid any EHLO strings which CSV
might recognize as "not-authorized". (YMMV, I suppose...)

The important thing here is to define the meaning of this bit. I
don't believe we _can_ enforce a particular algorithm to test for this;


Sure we can.  I just defined one.


   Aw c'mon... you know the difference between designing something and
"enforcing" its use...

There's no reason why this particular bit should change very often:
thus it could more appropriately be queried by a proxy service, say,
once a week, and a database distributed which receiving SMTP servers
could query without creating DNS traffic on the 'net.


Yet another service that has to scale and to work all the time?  Ugh.


   I'm perfectly willing to explain how to design it to scale, and work
at least as well as the average DNS lookup, if that's the sticking point.

Why would this be better than what DNS caches do now?


   DNS caches do pretty well right now, but we're talking about something
which would multiply by a factor of four (or more) the number of query
strings which would need to be cached. I'm not confident they'll scale
up in RAM reserved quickly enough. (The calculation is a bit frightful!)

They do cache negative responses, you know, and a week is a quite normal
TTL.


   Indeed they do (mostly); and yes, it is a common TTL; and my first-
blush design for scalability would take advantage of that.

   But the point isn't whether what we can deploy immediately is "better"
than DNS caches. The point is whether we want to claim that the current
situation will survive unchanged five or more years into the future.
Remember, every "MUST" in the spec will still be there five years from
now -- I'm convinced there will be more effective ways by then.

--
John Leslie <john(_at_)jlc(_dot_)net>