ietf-asrg
[Top] [All Lists]

Re: [Asrg] [ASRG] SMTP pull anyone?

2009-08-18 12:50:13
Ravi shankar wrote, On 8/17/09 5:53 AM:


 >DNS is used *as a medium* for various applications that are used to
identify

 >mail as legitimate or illegitimate by various standards of legitimacy,
and a

 >major reason for its use in those applications is to make it feasible for

 >mail systems to do the validation synchronously during the SMTP
session. By

 >using a lightweight, distributed, cached database, mail systems are spared

 >from deferring a message, queuing its validation, remembering the results,

 >and waiting for the sender to offer it in an identical way again. You are

 >suggesting that receivers should take on all the heavyweight
management but

 >retain using DNS for something unspecified. It makes no sense.

Bill,

Today's model is no different from what i have suggested in that they
deploy costly anti-spam

solutions, which utilise probably 10 fold resource than what this
solution will use. By allowing the system to cut most of the spam
through a simple pull mechanism, compares very well against today's
anti-spam software model, which not all can afford.

I don't see how this reduces the effort required on the receiving side in comparison to currently common practices. I do see how it increases receiving system effort compared to currently common practices. I suspect that you don't understand those practices, so I'll explain at length...

It is very common for mail servers to apply multiple threshold criteria (often utilizing DNS) before the DATA command in a SMTP session to decide how to respond to the earlier commands, often making rejection decisions very early. SPF and the most common type of DNSBL can be checked that way and often are, along with rules like requiring the sender domain to have a valid MX or A record, shunning clients that use idiosyncratically invalid HELO names, etc. This does not require message data analysis, as it is done before the message data is offered. After receiving a RCPT command, the receiver knows the IP address of the sending client, the name it used for itself in the HELO or EHLO command, the envelope sender address, one or more recipient addresses, and the reject/accept results for any previously named recipients. In some cases where extensions to SMTP are used, it may also know some message and authentication metadata. It is quite normal for a mail server to use those facts and derivative facts (like the existence and content of DNS records related to them) to decide how to respond to that RCPT command.

For many mail systems, anti-spam measures done before the DATA command using metadata safely reject a large majority of spam (often a large majority of all email) and whitelist a smaller stream of messages. This sidesteps high-cost approaches that parse message data. For example, from the last 10,000 connections to my own very small mail server, only 873 messages were passed to the part of my spam control system that examines the message data and 35 messages were cleared around that filtering. Obviously I can't get a perfect measurement for accuracy since I can't be sure that every error will be noticed and brought to my attention, but it has been many months and millions of messages since the last time I know that system to have rejected a legitimate message ahead of the data filters and it hasn't protected any spam from data filtering in the 5 years that I've been doing it. That performance is similar to what I've seen in the larger mail systems that I've managed for others.

The use of metadata rules (i.e. using envelope and session parameters and their derivatives) to reduce the flow of mail into message data filters is not a new or rare strategy, but rather is an evolutionary remnant of the earliest spam control tactics. For many years, spam exclusion was almost exclusively done before the DATA phase of SMTP because it worked well enough and because filtering based on message data was more resource-intensive than it could justify with results. To this day, well-run mail systems whose operators are concerned about the resource demands of spam control use the information available early in the SMTP transaction to decide whether to allow the sender to 'push' the message itself.

The 'pull' model you have described does not specify any way in which it can improve on the pre-data filtering that is already being done, but it does add a burden to both sides of legitimate transactions: keeping track of message offers that are pending a decision to pull and an actual pull attempt. In order to justify that added burden (in addition to the huge development and deployment costs) you would need to explain how your pull model facilitates better filtering than what sites do now. Sparing systems from message data filtering isn't enough, unless you have some case for your model doing that consistently and sustainably better than current tactics that operate during the SMTP session.


 >The *most* that SPF can provide towards showing "legitimacy" is to confirm

 >that the envelope sender address of a message is not forged. It is
very rare

 >for large senders of any sort to deploy records that can do that strongly.

 >There is nothing about SPF that directly attacks spamming. It could in

 >theory be used to attack sender forgery, but the collateral damage has

 >proven to be too great for either sending or receiving systems to actually

 >apply it strongly to that end. Meanwhile, a lot of spammers are sending a

 >lot of spam with senders that are validated to the degree that SPF can

 >validate anything.

Actually SPF only validate the legitimacy of the sender IP and domain
relation and i mentioned SPF as just a example.

SPF is specified as applying to the whole envelope sender. Explicit records using the %l macro are rare, but many domains assure that the hosts they affirm in SPF are using correct local parts in sender addresses. That is what would be expected with normal MTA software and configurations that could be affirmed in SPF.

And if the large senders
cannot implement something as simple as a TXT record for SPF (leave
alone DKIM), then probably they do no care about spam.

I understand that it is easy and tempting to be dismissive about the lack of care among large senders, but it is self-defeating when trying to devise and evangelize a new spam control mechanism.

It is worth noting that Microsoft (as Hotmail) has been the most important actor in getting SPF records deployed by others, even though Hotmail systems are chronic spam sources and their inbound mail systems do not use SPF records in anything like a normal way.

> SPF or DKIM are
only effective when deployed by all the domains that send mails.

That is a ridiculously false statement. I have to assume that we are having a problem of differing idioms of English, or else I would think you a fool.


 > 4. The sending server then hands over the message.

 > 5. To overcome DDoS attacks, the receiving server can be made to request

 > the next 10 or so Message IDs that it will assign to messages,

 > so that if a attacker tries to give those details, it will know from the

 > next list of message IDs that it's fake connection.

 >>>That sentence makes no sense. What did you mean to say?


What i mean is in order to prevent a system from getting overwhelmed, by
anonymous submission, if for say domain1.com server knows the next 10
message ID that will be sent by domain2.com, then it can confidently
reject those message submission attempts that does not have any mails in
this range (ofcourse this logic holds only if domain2.com is going to
send those 10 message IDs domain1.com only)

Okay, so you are redefining "Message ID" as a new identifier defined by each MTA for each message that it handles, rather than as something related to the Message-ID mail header.

That concept is interesting, but it is not consistent with how mail systems work today. It brings into question whether you have a useful understanding of the range of ways that people use email and the range of ways that mail servers handle mail. The practices that would have to end in order to enable this facet of your idea include those which forced SPF into its arcane complexity and those which constrain its strength and deployability today.


 >Nothing you have described would add to spam control as it is currently

 >being done, as far as I can see. The 'model' is too vague to critique inn

 >detail because you aren't really providing any meaningful details.

 >In order to bring anything truly new and useful to controlling email
spam, a

 >new idea has to either attack spam in a way that existing tactics
don't, do

 >a demonstrably better job than existing tactics, or overcome the negative

 >aspects of existing tactics. You have identified none of those in your new

 >idea.

I guess we are expecting a magic solution that will stop all the spam in
a single go and would not require us from changing our system
continuosly.

Not at all, and that is part of why I am skeptical about your suggestion. It would be a radically new way of handling email, to a degree that it would not really make sense to define it as an extension to SMTP.

But unfortunately, every system has flaws and has to be
corrected one step at a time, this i believe is the evolution.

Gradual evolutionary steps have to provide a real hope of some incremental benefit to early adopters without doing them immediate harm. Even if you had a fully detailed model for how this would work and had a deployable way to integrate it today into existing mail systems, you would need to assure that it would be harmless to offer now (i.e. no rejection of legitimate mail from non-users of the new system) and that it could provide some benefit for both senders and receivers who adopt it before it becomes widely deployed. As described, it increases the difficulty of handling mail for both sides and offers neither side any concrete benefits.


I have done my best to detail how this system applies in various steps
of a mail communication, may be i can work on a pictorial
representation, if someone else requires it as well.

If this is what you consider "detail" then you have a major obstacle to being taken seriously. Drawing pictures wouldn't be a step forward. Defining a transaction protocol would be, but I wouldn't suggest you do that until you identify concrete ways that your model offers benefits that existing common practices cannot offer.
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg