[Asrg] May I summarize this conversation, please? Rev "A"

People,

I'd like to take a moment and review what I think we've learned fromthese discussions. I am not even going to try to figure out who getscredit for what. Please correct me if I am wrong. I'd like to thankeverybody who contacted me and set me straight.

There seem to be three approaches to dealing with SPAM. One approachsolves the problem by solving four subproblems: identification,authentication, authorization, and trust. Another approach solves theproblem by some sort of lexical and/or semantical analysis of themessage itself. Yet another approach solves the problem by raising thecost of sending a message. There seems to be no consensus on whichapproach is "best", and in fact a solution might be a combination ofapproaches.

* We can uniquely identify the sender of a mail message using theirE-mail address and their MTA host name and/or MTA IP address.* However, we cannot authenticate that a message actually came from theperson the message identifies as the sender. This is so for a couple ofreasons: 1) SMTP makes no guarantee that the sender identification iseither honest or accurate. 2) a corrupt or incompetent MTA sysadmin canbreak any system that we care to implement. 3) My intuition tells methat a corrupt or incompetent UTA sysadmin can break any system, ifthrough no other method than telnetting to the MTA port 25. I think weare in agreement that this situation will not change without changingthe SMTP and we are very, very reluctant to do that for what are goodreasons.

* Several methods of authorizing messages have been proposed orimplemented. These methods can be grouped into internal examinationmethods and external examination methods. Internal examination methodsinclude keyword recognition systems, statistical analysis systems,Bayesian analysis, reputation systems, and other techniques that borderon A.I. External examination methods include whitelists, blacklists,challenge/response mechanisms, and counting sent/received message ratios.

* A trusted system or component is one with the power to break one'ssecurity policy. I think we have decided that we cannot trust thesender of an E_mail message. I think we are reluctant to trust animpartial third party e.g. Verisign because of concerns that the thirdparty will not be trustworthy in any of a number of ways. Also,communicating that trust from system to system is problematical. Thinkabout something like kerberos, on a planetary scale. Ideally, the trustmechanism should be in the receiver's MTA. Since an MTA can handle alot of messages, the computational cost should be low.

* That we are reluctant to change the SMTP and that we cannot trust thesenders at all strongly suggests that the solution to the SPAM problemis in the receiving MTA (or possibly the receiving UTA). This is a veryunsatisfactory conclusion because the cost of dealing with SPAM is bornby the receivers; there seems to be no way to shift the cost to thesenders. Proposals to raise the cost of sending E-mail using e-postageare rejected because it is hard to see how such schemes could beimplemented without changing the SMTP; and it raises issues of trust:anybody could claim to be an MTA and demand a bunch of your postage.

We've had several (frustrating at times) discussions about some goodideas that do not work. Whitelists and blacklists suffer from theproblems of insufficient granularity, timeliness, and the fact thatspammers frequently create new identities. If we could solve theauthentication problem, then blacklists and whitelists might work,especially if there was a mechanism by which the lists could beautomatically maintained between the MTA and the UTA. The problem withauthentication is that it increases the costs on the sending side. Thisseems to me to be A Good Thing, but it was criticized.

Internal examination methods tend to be compute intensive, and examininga message to see if it is spam is computationally expensive andunreliable. This could be dealt with by moving the SPAM inspectionfunction from the MTA to the UTA. Internal examination relies on theidea that certain words and phrases are diagnostic of SPAM, e.g. "Trustme" and "v.agra". So the spammer can increase the cost on the receiversside by using "v.agra", "v,agra", "v;agra", etc. Furthermore, aspammer could run his/her message through the internal examinationsystem, and modify the text until it was accepted as not spam.Challenge/Response was rejected because it was perceived as being hardfor the visually challenged, it has language issues, and it is notobvious how to implement it without changing the SMTP.Reputation systems were rejected (I think that is the consensus) becauseit is not clear that past behavior is a good predictor of futurebehavior: the Russell Chicken scenario. Also, crooks sometimes gostraight.

External methods such as looking at the ratio of sent messages toreceived messages were rejected because they required trusting anexternal system. For example, a spammer could claim that although hehad sent millions of messages, he had also received millions ofmessages, so he's not a spammer, he's just busy. Discrepencies betweenthe return address as stated in the HELO or ELHO message and the actualsource IP address cannot be the sole criteria for accepting/rejecting amessage because some ISPs separate the sending and receiving functionsfor load leveling purposes.

Multiple indicators have been proposed, but I do not understand how theywould work in practice. Graylists? If the internal examination systemfelt there was a high probability of spam and the sender has areputation for sending spam, then probably the message is spam. But ifwe cannot authenticate that the spammer is who he or she says he or sheis, then is the reputation meaningful? What if the internal examinationsystem and the external examination system disagree? And does multipleindicators have sufficient accuracy in prediction to justify the extracost of calculation?

One of my critics asked, rhetorically I think, why so many workers areenthralled with sender authentication. I think the question is fair,and I've been thinking about it. While I cannot speak for anybody else,the reason why I am interested in sender authentication is, as apracticing sysadmin, I think in terms ofidentification/authentication/authorization. When a user logs in with ausername, that identifies the user. Any of you are welcome to trylogging in to the jeff account on www.commercialventvac.com. When theuser feeds the system his or her password, that authenticates the user.None of you know my password, so you cannot be authenticated as me.However, the contents of /etc/passwd or the NIS password table are whatauthorizes the user to do whatever the user can do. So if I amidentified as jeff and authenticated as jeff, I am still forbidden to doanything to aaron (unless aaron authorizes me by changing permissions onhis files). E-mail is the only protocol I am aware of where theostensibly human receiver is connected to the server. In mostprotocols, the ostensibly human receiver is connected to the client; andthere is a mechanism for authenticating clients. The mechanism mightnot be very good (e.g. FTP and telnet require sending passwords over thewire "in clear") but SMTP is noteworthy for lacking any authenticationat all.

Finally, because there are trust issues, we do not want to give anyparticular entity a monopoly on the system. Furthermore, if there are"secrets" inside the system, eventually they will be discovered andpublished. So the system must be open.

To summarize, we want to invent a system which runs on the receiving MTAor possibly UTA, which will positively identify incoming E-mail messagesas either SPAM or not using some (set of ) criteria to be developed.This system must interoperate with existing MTAs. The system must be"open" in the sense that it is unencombered by patents or copyrights(the GPL or the BSD license is acceptable).


Sincerely yours,



Jeff



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg