Re: [Asrg] SICS

On Dec 22 2004, Barry Shein wrote:


The increased resource pressure from spammers isn't just "in the
noise", it's quite significant. Right now I think what we're seeing is
a mixture of upgrading anyhow and trying to adjust, and just putting
more bread crumbs in the hamburger (slower mail delivery, fooling
around with filters/blocks that are questionable, etc.)


I don't deny it's significant. I'm just trying to help clear up the issues.

I don't think the "per legitimate user" is a realistic measure any
longer since some huge percentage, I'll guess more than half, is to
unknown users and other mischief like endless probes etc.


I've got a back of the envelope scheme below to address that specific
problem. Purely theoretical, but I really do think that issue can be 
discounted.

Cheap is a relative thing. Don't sell "cheaper" (than a full delivery)
as "cheap".


I'm talking about really cheap. I'll give a rough idea below of what I
have in mind. Note also that der Mouse already implemented something
of that nature as described in his post, but probably not the way I'm
going to describe it.

PROBLEM:
Say you have one million valid email addresses. Each address is between
10 and 20 characters long. The problem is to cheaply discard inbound 
SMTP connections with invalid addresses in the RCPT command. 

WHAT TO DO:
Use a dedicated proxy in front of your SMTP servers. This proxy keeps
all the addresses in memory, either hashed or set up as a trie. 
This guarantees that given an address string, figuring if it's valid
is literally a few machine instructions, independent of the number of
valid addresses. You can reload those lists once an hour say, and
you can precompute aliases if you like, whatever.

The proxy has these addresses in memory, but its real job is to accept
socket connections. An SMTP server, once the client has connected,
typically waits for HELO, MAIL, RCPT, DATA followed by the mail
message.

The proxy server is not going to be a full SMTP server. Conceptually,
it's going to wait for HELO, MAIL, RCPT, and at that point check if
the address is valid. If valid, the proxy opens a real SMTP connection
to your SMTP servers and replays the HELO, MAIL, RCPT commands, then
transparently forwards the SMTP conversation.  If the address was
invalid, there's no need to bother the SMTP servers, just return an
error and do whatever the protocol allows to close the connection.

So your proxy code really doesn't need to be very smart at all, and
the processing load for each attempted connection (up until the RCPT)
will be really cheap: buffer space for a couple of lines of input to
be replayed, and a very small number of instructions to check the
address, and after that it's mindless copying of bytes. So ten or
twenty megabytes for the full list of addresses, and a couple of K for
buffering each open socket, if you're paranoid about ultra-long SMTP
commands.
  
The bottleneck will be how many simultaneous open sockets your server
can handle over time. That'll depend on your kernel/hardware limits and
the time you have to wait until a socket is reusable.

Either way, let's take Devdas Bhagat's statistics 

http://nixcartel.org/~devdas/minute.png

I don't know if these numbers include invalid users or not, but let's
say 60K invalid requests per minute, and another 10K valid requests.
That's one thousand rejected requests per second, and less than 200
long lived requests. If you wait 30 seconds until you can reuse a
socket, then you'll need about 35,000 socket addresses in total in
this case. Or if you have fewer, you can use several proxies in
parallel.


My experience sez that spammers seem to follow a parkinsonian law of
resource exploitation which means if you get more resources they'll
use more resources.


Probably. One problem at a time.

Moore's law mostly applies to processing power (specifically,
transistor density.)

Disk speed and network bandwidth costs, two very critical factors in
mail server scaling, don't go up anything like Moore's law.


You'll note that the proxy scenario I painted needs no disk space, but
does need good I/O hardware and a typical PC's worth of RAM.


ASRG is a research group.

Anxious as we may be for a solution I still think we're quite a ways
away from any meeting of the minds on what the problem is or, perhaps
put better, what a solution might look like.


Good point. However, I do think some real numbers once in a while help
focus the issues.

BTW, the proxy scenario I gave above isn't intended to be a ready
solution, rather it's an argument to prop up my claim that handling
the invalid requests can be handled scalably.

-- 
Laird Breyer.


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg