ietf-asrg
[Top] [All Lists]

Re: [Asrg] Lets Fix Mailing Lists

2003-03-09 18:10:23
Vernon, you have a number of good points here.  I'm CC'ing Marco Paganini,
ASK creator, who is currently running a user study for his upcoming FREENIX
paper on ASK.  Hopefully, he'll be able to respond much better than I could,
and perhaps quantify some of his experiences for this list.

Disclaimer: I'm shepherding Marco's paper for FREENIX, so I may be just a
tad biased. :-)

In message 
<200303100033(_dot_)h2A0XmGP003843(_at_)calcite(_dot_)rhyolite(_dot_)com>, 
Vernon Schryver writes:
From: Erez Zadok <ezk(_at_)cs(_dot_)sunysb(_dot_)edu>

...
My own experience is also that the vast majority of spam comes from forged
addresses that don't exist.  I personally am using ASK, a challenge-response
personal spam filter to be very effective in reducing the rate at which spam
sneaks in significantly.  ASK works well for me, but it does have
scalability problems if deployed widely.

How do you know that the source addresses don't exist, because your
challenge-response system gets a bounce for its challenges?  That
would at most show that they don't exist when you challenge them, and
might in some cases only show that your challenges are unacceptable
(e.g. detected by a spam filter) and so rejected from perfectly valid
and legitimate addresses.  (For example, if your challenges are
substantially identical or bulk, then they are likely to be rejected
by spam filters like the DCC and Pyzor/Razor.)

The challenges are not too similar.  They use a checksum to correlate their
original message w/ the response to the challenge.  A bounce is not used to
assume spam; a _lack_ of response to a (signed) challenge (after a period
of time), is considered instead.  The assumption here is that there are very
few cases where there are "real users" behind spam sources, who'd sift
through challenges and bother responding to them.  (Obviously there's more
details here, but I don't want to turn this email into a detailed
description of ASK.)

Assuming that your vast majority of addresses in fact do not exist,
how do you know that none of the spammers owned the addresses they
are using in the reasonably recent past, perhaps even when they queued
the spam on the on the open relay or other SMTP client?  For example,
how do you distinguish "never existed" from "terminated for spamming"?
Do you consider both cases to be "forged?"  I hope not.

From my personal perspective as a user, why should I care?  If the spammer's
account never existed or was turned off due to spamming, in both cases a
valid response to my challenge isn't likely to be returned, and so I won't
get to see that original spam message.

However, I do think that these challenge-response systems can overburden an
internet-wide SMTP system with all of these challenges.  That's why I said
in my original email that I think these systems suffer from scalability
problems.  I hope this list can come up with good and scalable solutions.

That distinction is significant, because contributors to this mailing
list have claimed that some forgeable spam defenses are impossible,
because they believe forgery is extremely common.  Others have claimed
that spam defenses based on validating sources would be sufficent
using similar reasoning.  Both are reasons are bogus if forged spam
is in fact relatively rare, although the claims might still be valid
for other reasons.

Right.  My personal experience is that most (>99%) spam is from forged
addresses.  Marco is conducting a wider study (than just oh' lonely me :-)
and hopefully he can provide more accurate results.

It's also worth quantifying the total from which that or any "vast
majority" is drawn.  For example, conclusions based on 5000 messages
sent toward a handful addresses should convince no one of much of
anything.

As others have said, numbers with complete disclosure of what
they mean would be good.

As I keep saying, I agree that plenty of spam has forged sources.
The question is whether that "plenty" is 1%, 10%, 50%, or 99%.  As I
said, my guess is ~10%.  Note that I used the word "guess."  I wish
that everyone else who is guessing would admit as much.  As far as I
can tell, everyone is guessing.

Agreed.

I've read some articles in popular media that claim that over 50% of all
email is spam.  While I believe, from personal experience, that >99% of spam
is forged, I don't believe (again from personal experience), that 50% of all
email is spam.  All these anecdotal or personal results are not enough.

Perhaps what this list should do is prepare a Web based survey that
interested parties can complete.  That is exactly what I suggested that
Marco do before his paper is published, and he took on that daunting task of
a user survey.  I think a carefully designed study with as many possible
users filling it, would go a long way toward defining the scope of the spam
problem nowadays.

Cheers,
Erez.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg