ietf-asrg
[Top] [All Lists]

[Asrg] May I summarize this conversation, please? Rev "A"

2004-05-13 00:34:47
People,

I'd like to take a moment and review what I think we've learned from these discussions. I am not even going to try to figure out who gets credit for what. Please correct me if I am wrong. I'd like to thank everybody who contacted me and set me straight.

There seem to be three approaches to dealing with SPAM. One approach solves the problem by solving four subproblems: identification, authentication, authorization, and trust. Another approach solves the problem by some sort of lexical and/or semantical analysis of the message itself. Yet another approach solves the problem by raising the cost of sending a message. There seems to be no consensus on which approach is "best", and in fact a solution might be a combination of approaches.

* We can uniquely identify the sender of a mail message using their E-mail address and their MTA host name and/or MTA IP address. * However, we cannot authenticate that a message actually came from the person the message identifies as the sender. This is so for a couple of reasons: 1) SMTP makes no guarantee that the sender identification is either honest or accurate. 2) a corrupt or incompetent MTA sysadmin can break any system that we care to implement. 3) My intuition tells me that a corrupt or incompetent UTA sysadmin can break any system, if through no other method than telnetting to the MTA port 25. I think we are in agreement that this situation will not change without changing the SMTP and we are very, very reluctant to do that for what are good reasons.

* Several methods of authorizing messages have been proposed or implemented. These methods can be grouped into internal examination methods and external examination methods. Internal examination methods include keyword recognition systems, statistical analysis systems, Bayesian analysis, reputation systems, and other techniques that border on A.I. External examination methods include whitelists, blacklists, challenge/response mechanisms, and counting sent/received message ratios.

* A trusted system or component is one with the power to break one's security policy. I think we have decided that we cannot trust the sender of an E_mail message. I think we are reluctant to trust an impartial third party e.g. Verisign because of concerns that the third party will not be trustworthy in any of a number of ways. Also, communicating that trust from system to system is problematical. Think about something like kerberos, on a planetary scale. Ideally, the trust mechanism should be in the receiver's MTA. Since an MTA can handle a lot of messages, the computational cost should be low.

* That we are reluctant to change the SMTP and that we cannot trust the senders at all strongly suggests that the solution to the SPAM problem is in the receiving MTA (or possibly the receiving UTA). This is a very unsatisfactory conclusion because the cost of dealing with SPAM is born by the receivers; there seems to be no way to shift the cost to the senders. Proposals to raise the cost of sending E-mail using e-postage are rejected because it is hard to see how such schemes could be implemented without changing the SMTP; and it raises issues of trust: anybody could claim to be an MTA and demand a bunch of your postage.

We've had several (frustrating at times) discussions about some good ideas that do not work. Whitelists and blacklists suffer from the problems of insufficient granularity, timeliness, and the fact that spammers frequently create new identities. If we could solve the authentication problem, then blacklists and whitelists might work, especially if there was a mechanism by which the lists could be automatically maintained between the MTA and the UTA. The problem with authentication is that it increases the costs on the sending side. This seems to me to be A Good Thing, but it was criticized.

Internal examination methods tend to be compute intensive, and examining a message to see if it is spam is computationally expensive and unreliable. This could be dealt with by moving the SPAM inspection function from the MTA to the UTA. Internal examination relies on the idea that certain words and phrases are diagnostic of SPAM, e.g. "Trust me" and "v.agra". So the spammer can increase the cost on the receivers side by using "v.agra", "v,agra", "v;agra", etc. Furthermore, a spammer could run his/her message through the internal examination system, and modify the text until it was accepted as not spam. Challenge/Response was rejected because it was perceived as being hard for the visually challenged, it has language issues, and it is not obvious how to implement it without changing the SMTP. Reputation systems were rejected (I think that is the consensus) because it is not clear that past behavior is a good predictor of future behavior: the Russell Chicken scenario. Also, crooks sometimes go straight.

External methods such as looking at the ratio of sent messages to received messages were rejected because they required trusting an external system. For example, a spammer could claim that although he had sent millions of messages, he had also received millions of messages, so he's not a spammer, he's just busy. Discrepencies between the return address as stated in the HELO or ELHO message and the actual source IP address cannot be the sole criteria for accepting/rejecting a message because some ISPs separate the sending and receiving functions for load leveling purposes.

Multiple indicators have been proposed, but I do not understand how they would work in practice. Graylists? If the internal examination system felt there was a high probability of spam and the sender has a reputation for sending spam, then probably the message is spam. But if we cannot authenticate that the spammer is who he or she says he or she is, then is the reputation meaningful? What if the internal examination system and the external examination system disagree? And does multiple indicators have sufficient accuracy in prediction to justify the extra cost of calculation?

One of my critics asked, rhetorically I think, why so many workers are enthralled with sender authentication. I think the question is fair, and I've been thinking about it. While I cannot speak for anybody else, the reason why I am interested in sender authentication is, as a practicing sysadmin, I think in terms of identification/authentication/authorization. When a user logs in with a username, that identifies the user. Any of you are welcome to try logging in to the jeff account on www.commercialventvac.com. When the user feeds the system his or her password, that authenticates the user. None of you know my password, so you cannot be authenticated as me. However, the contents of /etc/passwd or the NIS password table are what authorizes the user to do whatever the user can do. So if I am identified as jeff and authenticated as jeff, I am still forbidden to do anything to aaron (unless aaron authorizes me by changing permissions on his files). E-mail is the only protocol I am aware of where the ostensibly human receiver is connected to the server. In most protocols, the ostensibly human receiver is connected to the client; and there is a mechanism for authenticating clients. The mechanism might not be very good (e.g. FTP and telnet require sending passwords over the wire "in clear") but SMTP is noteworthy for lacking any authentication at all.

Finally, because there are trust issues, we do not want to give any particular entity a monopoly on the system. Furthermore, if there are "secrets" inside the system, eventually they will be discovered and published. So the system must be open.

To summarize, we want to invent a system which runs on the receiving MTA or possibly UTA, which will positively identify incoming E-mail messages as either SPAM or not using some (set of ) criteria to be developed. This system must interoperate with existing MTAs. The system must be "open" in the sense that it is unencombered by patents or copyrights (the GPL or the BSD license is acceptable).

Sincerely yours,



Jeff



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg



<Prev in Thread] Current Thread [Next in Thread>