Excellent. I agree that the "lets not do this because of the noise" may
render a lot of other simliar efforts null and void as well, so lets
keep this on. I think your below fits KISS and should be easy to
implement.
We need about 4-5 people to step up and "own" a given email address per
Tom's plan below. We'll provide the infrastructure.
Who will participate? Stand up now.
-----Original Message-----
From: Tom Thomson [mailto:tthomson(_at_)neosinteractive(_dot_)com]
Sent: Thursday, August 14, 2003 4:12 AM
To: Peter Kay; asrg(_at_)ietf(_dot_)org
Subject: RE: [Asrg] RE: 2.a.1 Analysis of Actual Spam Data -
Titan Key re duces spam attacks
I'm not convinced that the noise factor is so great that we
can't get any useful information out of analysis. What I am
sure of is that if that argument wins this time and this line
of enquiry is dropped as a result, then most (maybe all)
other lines of enquiry will go the same way. Until we have
some measurements (quite a lot of measurements) we don't
really know how big the noise factor is. Your original less
than 100 days of data looked pretty noisy but all sorts of
techniques are available for smoothing away noise; until
there's enough dta, we don't know whether those techniques
will prevail in the current case.
An experimental approach would be to set up a number of email
accounts and give them different degrees of exposure (some
with repeated on-going exposure, some with initial exposure
but no repeated exposure, some with exposure through
"respectable" mailing lists only and some with exposure
through "non-respectable" lists, some with newsgroup exposure
(again "respectable" and "non-respectable"), and so on
covering quite a lot of variations in exposure and a good mix
of combinations of exposure. Then tyere's a need for a large
number of email accounts with identical exposure; and each
variant needs to occur (many times) in many different email
domains (different ISPs, different countries, ...). Then
some proportion of each similar block of acounts needs to
start using 550 rejections straight away, some proportion
needs to start using them after a few weeks, some proportion
needs to never use them. After enough time (probably a year
or so) there will be enough data to draw conclusions about
the noise level from, and perhaps even to draw conclusions
about the effectiveness of the 550 technique.
Tom
-----Original Message-----
From: asrg-admin(_at_)ietf(_dot_)org
[mailto:asrg-admin(_at_)ietf(_dot_)org]On
Behalf Of Peter Kay
Sent: 13 August 2003 17:07
To: asrg(_at_)ietf(_dot_)org
Subject: RE: [Asrg] RE: 2.a.1 Analysis of Actual Spam Data -
Titan Key re duces spam attacks
Ok everyone, its decision time.
================ Summary of this thread ===============
What we're trying to determine is: will "hard bounce"
handling of spam (such as 550 no such user or 451greylisting
response) reduce the amount of "spam attacks" to a given
email address over time versus an email address that does not
employ such tactics?
The question is: what research process can we execute such
that the results can be deemed reasonably reliable to
prove/disprove what we're trying to determine?
Our resident statistician, Terry Sullivan, basically says
that the amount of noise on the data and the variances in
spam make it difficult if not impossible to statistically
prove the above unless the effect of the hard bounces is
huge. And if the effect is so huge, why hasn't it already
been figured out?
=============== The issues at hand =========================
1) Do we continue to pursue this research topic?
2) If so, what process can we agree on to follow?
============== next steps ===========================
This is a call to action to everone one this list to address
the above issues. Lets see what responses we get over a
week's time. If we get either a bunch of "no" or nothing at
all, we're close this issue and move on. If, on the other
hand, we get some positive responses that INCLUDE IDEAS ON A
RESEARCH PROCESS, we can then gather those ideas into a
research plan and then move to execute.
======= misc, but important additional info ================
We've had at least 2 people lend some degree of assistance,
Scott Nelson has offered to include researching the 550 on
his experiments, and Damon Sauer has offered millions of
emails worth of data. It's not entirely clear how either of
these 2 would support the research process, but their
offerings are at least certainly appreciated. Yakov has also
mentioned that we have access to data from Brightmail and Postini.
Paul Judge pointed to a few links out there related to what
we're trying to research:
http://www.simplyquick.com/privacy.html#3
This is a report of what happened after subscribing to several
newsletters, performing normal actions on those newsletters (opening,
clicking, etc) and then unsubscribing. Result: no spam, but several
newsletters did not respect the unsubscribe request.
http://www.infoworld.com/article/03/03/21/12ebsecret_1.html\
That talks about the results of another firm researching what HTML codes
do in spam mail AND WHAT EFFECT USING A 550 NO SUCH USER RESPONSE
(emphasis added) has on subsequent spam.
http://www.out-law.com/php/page.php?page_id=pressrele3360&area=about
The actual Web page of the firm conducting the research mentioned by the
infoworld article.
=========== YOUR ACTION ITEM ====================
(Yes, you. The subscriber to this.)
Duedate: on/before 8/21/03 12:01am GMT +0
Decide if issue #1 is worth pursuing and if so, think about a research
process (please make sure you KISS) and email that process in simple
bullet forms to this list. Positive responses must include bullet
points on a research process in order to be considered positive.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg