Re: [Asrg] SICS

Maybe, but HEADER analysis might be helpful.

I don't think I quite understand your point here.

A mail transaction typically goes like this:

<tcp connection> from high port to port 25 of the server
The server checks up the reverse DNS of the client.
S: 220 <server.hostname> [ESMTP]
C: HELO <client.hostname> (or EHLO for ESMTP)
S: 250 Ok
C: MAIL FROM:<sender(_at_)example(_dot_)com>
S: 250 Ok
C: RCPT TO:<recipient(_at_)example(_dot_)net>
S: 250 Ok
... (client adds more RCPT TOs here)
C: Data
S: 354 Send CRLF.CRLF to end
Message headers

Message Body

.
S: 250 Ok
quit
S: 221 Ok
<tcp connection teardown>

Of course.

Now, you know the actual message size only after the data was sent.


Well, you know the maximum OFFERED size by then.

IF during the transfer, the recipient's inbox overflows (and, if they are the 
ONLY addressee there for the message!) you no longer have to be concerned about 
how much more there might be in the wings... once you've decided that the 
message is "too big" and is going to be bounced.  You don't HAVE to keep 
receiving the rest of it, at that point.

If the client sends a ehlo, it may send a size value, but that cannot be

known to be correct. The message headers and body are part of the
message itself, as part as SMTP is concerned.

Sure, but that doesn't mean that the connection can't be dropped DURING the 
body 
transfer.

After the receiving mail server has gotten the complete full header, then it

knows not only who the intended recipient is but also who the stated sender
is, and that could permit it to establish what that recipient's permissions
are for that sender... and SOME of those, at least, could perhaps be
enforced during the time the incoming mail arrival is being done.

This is more complex, but doable. A restrictions lookup table keys off

the recipient address, and 

Right.

The easiest one to enforce is maximum allowed message size for that

sender (or default 'unknown' sender) since that one already has to be
done anyhow if the user's maximum inbox size is exceeded (and that isn't
known, either, until too much data comes in).

Actually, this isn't known until delivery is attempted.


Right, at the final mail server.

...No one needs to have the mail storage system on the same server as the

receiving MTA. The complexity and fraility of trying to push quota overflow 
conditions in realtime to external MTAs is bad enough that I would just allow 
for bounces to be generated later.

Certainly a reasonable enough position.

It's a little more complicated (but not HUGELY difficult) to have the

receiving server recognize message divisions and enforce (at least on a
preliminary, tentative basis) recipient permissions for attachments,
HTML-burdened attachments, and such.

This requires a large amount of code to be pushed into MTAs.


Oh, not really.  I wouldn't try to do 'full' content analysis there, but it 
MIGHT make sense to do some levels of stuff (binary attachments and their 
extensions, maybe).

Parsing text takes up CPU. Gobs of CPU.


A lot of that depends on what you're trying to do, and what kind of tools 
you're 
trying to do it with.

Just to show how much pre data filtering can help (I wont quote names here,

since I don't have permission to do that yet):
A canadian ISP had 4 boxes handing 200K mails/day each on average, with 
SpamAssassin filtering the content for suspicious mail. Note that this was 
post recipient validation, with a few externsions filtered out
(exe/cpl/pif/.., the common viral vectors). SA was running on 9 boxes,
and those were barely able to keep up with the load (SA was daemonised).
By adding a single DNSBL (the cbl-sbl.spamhaus.org list), they reduced
the inflow to 80K mails/day per host on average, and need 6 SA boxes for
filtering comfortably.

Okay, but I wasn't suggesting anything nearly as complicated as SpamAssassin at 
the ISP end (I've always said that this ought to be at the recipient end, where 
processing power is cheaper and more plentiful).

SpamAssassin is probably written in Perl or something, and that language is NOT 
particularly efficient (especially compared to something that's more powerful 
and more efficient for textual analysis and pattern recognition, such as 
SPITBOL). 

Let's recall that SpamAssassin was DESIGNED to be run at the end-user machine, 
and it didn't HAVE to be fast or efficient.

I'll let you figure out the bandwidth and server savings, and the

administrative time saved on that (the admins had been trying for three
months at least to tune the SA boxes so that they would keep up).

I am not convinced that the effort is worthwhile, since it's so 
implementation-dependent in a moving-target situation.

But (although there is a potential cost savings from truncating such mails

earlier in the process, before they have been fully transferred) there is a 
downside which may override the potential savings... and that is the

When do you plan to show actual savings? There are no savings post DATA,

except the cost of putting that message in the temporary mailstore on
the server.

Depends on how far up the chain you are, and how many forwards will be done 
between the sender and the recipient, and how much data transfer can be 
avoided. 
But like I said, any such savings may not be worth the extra complexity.

recipient's desire to be able to change their mind upon considering the

"spam" filtering decision... it's harder to do that, and go ahead and
approve the message for delivery (and possibly revising the filtering

Have you considered that message may have more than one recipient, and

that enforcing different message sizes per recipient in the MTA is simply 
not feasible (there is exactly one return value after data)?

I hadn't considered that, but it's certainly a valid point... and yet another 
reason why (based on complexity cost) it simply might make the most sense to do 
the filtering at the recipient end.

rules for that sender), if in fact it was blocked or the server connection

was dropped during transfer.  It may be that the bandwidth cost (and 
that's ultimately getting cheaper and cheaper) is simply less costly than

It is? Not as fast as the spam volume is going up.


Obviously we want to reduce spam volume.  My approach is all about that.

the added complexity of trying to be more clever, "too" early in the

process.



Gordon Peterson                  http://personal.terabites.com/
1977-2002  Twenty-fifth anniversary year of Local Area Networking!
Support free and fair US elections!  http://stickers.defend-democracy.org
12/19/98: Partisan Republicans scornfully ignore the voters they "represent".
12/09/00: the date the Republican Party took down democracy in America.



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg