ietf-mxcomp
[Top] [All Lists]

Re: The Computational Load of MARID

2004-05-14 08:41:33

In <20040514131951(_dot_)GA71593(_at_)verdi> John Leslie 
<john(_at_)jlc(_dot_)net> writes:

   Andy Newton suggested I start a thread on the balance between ease
of sender advertising and the receiver computing load. Here's my
attempt...

I think this is an excellent topic.

I think that it is very important to consider *ALL COSTS*, and not
just look at parts of isolated examples.  In particular, we must
consider the effects of DNS caching, the DNS queries that would be
done anyway for other reasons, the CPU load, the distribution of
domains in email, the distribution of legitimate vs illegitimate email
sources, etc.

Unfortunately, it is very hard to calculate the total costs. :-< I've
tried a couple of times, but always got sidetracked before I could
complete a reasonable estimate.  I forget who it was, but someone
posted some stats to the SPF list that showed that the increased DNS
bandwidth used doing SPF checks was lost in the noise of other DNS
checking that was already going on.



My short response is: I think that even the most expensive proposal
(i.e. SPF) can be made cheap enough that the cost is not a problem.



My longer response:


I say that SPF is the most expensive proposal because, in part, it is
the most flexible.  It has both an include: and an mx: mechanism, like
MicroSoft's C-ID proposal, and it has an exists: mechanism that can
create the same costs as DMP.  So, it is easy to construct an SPF
record that has higher costs than either C-ID or DMP.  Now, the
average cost in the real world for SPF may well be lower than either
C-ID or DMP, but I'll stick with SPF for discussion purposes.


CPU costs:

I think the CPU costs for all proposals are so small that they can be
ignored.  Sendmail has had a much more complicated macro language (rule
sets) for around 20 decades now, and no one considers that an issue.


DNS caching and domain name distribution:

This is going to be highly dependant on the email environment of each
MTA, however I suspect that for most places, a very large percentage
of email comes from a very small percentage of the domain names.  So
if a significant percentage of your email comes from, say, AOL, the
costs of doing the DNS looks has to be averaged over the number of
times you get a DNS cache hit.


DNS queries that will be done anyway:

If you both send and receive email from a domain, you will have had to
do DNS lookups for things like the MX records anyway.  Most of the
time, a single MX lookup will return everything you need to know
because the IP addresses will be returned in the additional section of
the DNS query.

Likewise many mail systems will do PTR checks and MX checks as a
standard part of their anti-spam systems.  Since MARID checking is
optional, folks that don't want to do any DNS checks (e.g. hotmail)
will probably not want to do MARID checks because all of the proposals
require some DNS checks.  Folks that are willing to do some DNS checks
will likely find that the incremental cost is small, even for SPF.



Distribution of legitimate vs illegitimate email:

All of the proposals fall into two categories: Those that try to
describe complete sets of legitimate vs illegitimate sources (RMX,
SPF, the 30% solution, etc.) and those that use rDNS tree type queries
to find out about a single IP address at a time (DMP, SPF, CSV, etc.)

If you look at just one email in isolation, and assume no DNS caching,
the rDNS tree proposals look cheaper because they require only one or
two DNS lookups.  However, since a large percentage of today's email
comes from a large number of hijacked home PCs, there will be a large
number of "cold hits" on the DNS cache.  That is, you will often get
only one email from a given (IP address, domain name) pair because
spammers randomize both of them.

As a result, I believe that in practice, proposals that describe the
complete sets in a DNS cache friendly way will have a slightly lower
real DNS cost.



Ok, this is a long post, and I've been posting more than the suggested
3 message per day limit, so I'll shut up now and take the rest of the
day off.


-wayne