[Asrg] May I summarize this conversation, please? Rev "A"
2004-05-13 00:34:47
People,
I'd like to take a moment and review what I think we've learned from
these discussions. I am not even going to try to figure out who gets
credit for what. Please correct me if I am wrong. I'd like to thank
everybody who contacted me and set me straight.
There seem to be three approaches to dealing with SPAM. One approach
solves the problem by solving four subproblems: identification,
authentication, authorization, and trust. Another approach solves the
problem by some sort of lexical and/or semantical analysis of the
message itself. Yet another approach solves the problem by raising the
cost of sending a message. There seems to be no consensus on which
approach is "best", and in fact a solution might be a combination of
approaches.
* We can uniquely identify the sender of a mail message using their
E-mail address and their MTA host name and/or MTA IP address.
* However, we cannot authenticate that a message actually came from the
person the message identifies as the sender. This is so for a couple of
reasons: 1) SMTP makes no guarantee that the sender identification is
either honest or accurate. 2) a corrupt or incompetent MTA sysadmin can
break any system that we care to implement. 3) My intuition tells me
that a corrupt or incompetent UTA sysadmin can break any system, if
through no other method than telnetting to the MTA port 25. I think we
are in agreement that this situation will not change without changing
the SMTP and we are very, very reluctant to do that for what are good
reasons.
* Several methods of authorizing messages have been proposed or
implemented. These methods can be grouped into internal examination
methods and external examination methods. Internal examination methods
include keyword recognition systems, statistical analysis systems,
Bayesian analysis, reputation systems, and other techniques that border
on A.I. External examination methods include whitelists, blacklists,
challenge/response mechanisms, and counting sent/received message ratios.
* A trusted system or component is one with the power to break one's
security policy. I think we have decided that we cannot trust the
sender of an E_mail message. I think we are reluctant to trust an
impartial third party e.g. Verisign because of concerns that the third
party will not be trustworthy in any of a number of ways. Also,
communicating that trust from system to system is problematical. Think
about something like kerberos, on a planetary scale. Ideally, the trust
mechanism should be in the receiver's MTA. Since an MTA can handle a
lot of messages, the computational cost should be low.
* That we are reluctant to change the SMTP and that we cannot trust the
senders at all strongly suggests that the solution to the SPAM problem
is in the receiving MTA (or possibly the receiving UTA). This is a very
unsatisfactory conclusion because the cost of dealing with SPAM is born
by the receivers; there seems to be no way to shift the cost to the
senders. Proposals to raise the cost of sending E-mail using e-postage
are rejected because it is hard to see how such schemes could be
implemented without changing the SMTP; and it raises issues of trust:
anybody could claim to be an MTA and demand a bunch of your postage.
We've had several (frustrating at times) discussions about some good
ideas that do not work. Whitelists and blacklists suffer from the
problems of insufficient granularity, timeliness, and the fact that
spammers frequently create new identities. If we could solve the
authentication problem, then blacklists and whitelists might work,
especially if there was a mechanism by which the lists could be
automatically maintained between the MTA and the UTA. The problem with
authentication is that it increases the costs on the sending side. This
seems to me to be A Good Thing, but it was criticized.
Internal examination methods tend to be compute intensive, and examining
a message to see if it is spam is computationally expensive and
unreliable. This could be dealt with by moving the SPAM inspection
function from the MTA to the UTA. Internal examination relies on the
idea that certain words and phrases are diagnostic of SPAM, e.g. "Trust
me" and "v.agra". So the spammer can increase the cost on the receivers
side by using "v.agra", "v,agra", "v;agra", etc. Furthermore, a
spammer could run his/her message through the internal examination
system, and modify the text until it was accepted as not spam.
Challenge/Response was rejected because it was perceived as being hard
for the visually challenged, it has language issues, and it is not
obvious how to implement it without changing the SMTP.
Reputation systems were rejected (I think that is the consensus) because
it is not clear that past behavior is a good predictor of future
behavior: the Russell Chicken scenario. Also, crooks sometimes go
straight.
External methods such as looking at the ratio of sent messages to
received messages were rejected because they required trusting an
external system. For example, a spammer could claim that although he
had sent millions of messages, he had also received millions of
messages, so he's not a spammer, he's just busy. Discrepencies between
the return address as stated in the HELO or ELHO message and the actual
source IP address cannot be the sole criteria for accepting/rejecting a
message because some ISPs separate the sending and receiving functions
for load leveling purposes.
Multiple indicators have been proposed, but I do not understand how they
would work in practice. Graylists? If the internal examination system
felt there was a high probability of spam and the sender has a
reputation for sending spam, then probably the message is spam. But if
we cannot authenticate that the spammer is who he or she says he or she
is, then is the reputation meaningful? What if the internal examination
system and the external examination system disagree? And does multiple
indicators have sufficient accuracy in prediction to justify the extra
cost of calculation?
One of my critics asked, rhetorically I think, why so many workers are
enthralled with sender authentication. I think the question is fair,
and I've been thinking about it. While I cannot speak for anybody else,
the reason why I am interested in sender authentication is, as a
practicing sysadmin, I think in terms of
identification/authentication/authorization. When a user logs in with a
username, that identifies the user. Any of you are welcome to try
logging in to the jeff account on www.commercialventvac.com. When the
user feeds the system his or her password, that authenticates the user.
None of you know my password, so you cannot be authenticated as me.
However, the contents of /etc/passwd or the NIS password table are what
authorizes the user to do whatever the user can do. So if I am
identified as jeff and authenticated as jeff, I am still forbidden to do
anything to aaron (unless aaron authorizes me by changing permissions on
his files). E-mail is the only protocol I am aware of where the
ostensibly human receiver is connected to the server. In most
protocols, the ostensibly human receiver is connected to the client; and
there is a mechanism for authenticating clients. The mechanism might
not be very good (e.g. FTP and telnet require sending passwords over the
wire "in clear") but SMTP is noteworthy for lacking any authentication
at all.
Finally, because there are trust issues, we do not want to give any
particular entity a monopoly on the system. Furthermore, if there are
"secrets" inside the system, eventually they will be discovered and
published. So the system must be open.
To summarize, we want to invent a system which runs on the receiving MTA
or possibly UTA, which will positively identify incoming E-mail messages
as either SPAM or not using some (set of ) criteria to be developed.
This system must interoperate with existing MTAs. The system must be
"open" in the sense that it is unencombered by patents or copyrights
(the GPL or the BSD license is acceptable).
Sincerely yours,
Jeff
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Asrg] May I summarize this conversation, please? Rev "A",
Jeff Silverman <=
|
|
|