RE: [Asrg] RE: 2.a.1 Analysis of Actual Spam Data

Excellent. I agree that the "lets not do this because of the noise" may
render a lot of other simliar efforts null and void as well, so lets
keep this on. I think your below fits KISS and should be easy to
implement.

We need about 4-5 people to step up and "own" a given email address per
Tom's plan below. We'll provide the infrastructure. 

Who will participate?  Stand up now.

-----Original Message-----
From: Tom Thomson [mailto:tthomson(_at_)neosinteractive(_dot_)com] 
Sent: Thursday, August 14, 2003 4:12 AM
To: Peter Kay; asrg(_at_)ietf(_dot_)org
Subject: RE: [Asrg] RE: 2.a.1 Analysis of Actual Spam Data - 
Titan Key re duces spam attacks

I'm not convinced that the noise factor is so great that we 
can't get any useful information out of analysis.  What I am 
sure of is that if that argument wins this time and this line 
of enquiry is dropped as a result, then most (maybe all) 
other lines of enquiry will go the same way.  Until we have 
some measurements (quite a lot of measurements) we don't 
really know how big the noise factor is.  Your original less 
than 100 days of data looked pretty noisy but all sorts of 
techniques are available for smoothing away noise; until 
there's enough dta, we don't know whether those techniques 
will prevail in the current case.

An experimental approach would be to set up a number of email 
accounts and give them different degrees of exposure (some 
with repeated on-going exposure, some with initial exposure 
but no repeated exposure, some with exposure through 
"respectable" mailing lists only and some with exposure 
through "non-respectable" lists, some with newsgroup exposure 
(again "respectable" and "non-respectable"), and so on 
covering quite a lot of variations in exposure and a good mix 
of combinations of exposure.  Then tyere's a need for a large 
number of email accounts with identical exposure; and each 
variant needs to occur (many times) in many different email 
domains (different ISPs, different countries, ...).  Then 
some proportion of each similar block of acounts needs to 
start using 550 rejections straight away, some proportion 
needs to start using them after a few weeks, some proportion 
needs to never use them.  After enough time (probably a year 
or so) there will be enough data to draw conclusions about 
the noise level from, and perhaps even to draw conclusions 
about the effectiveness of the 550 technique.

Tom

-----Original Message-----
From: asrg-admin(_at_)ietf(_dot_)org 
[mailto:asrg-admin(_at_)ietf(_dot_)org]On 
Behalf Of Peter Kay
Sent: 13 August 2003 17:07
To: asrg(_at_)ietf(_dot_)org
Subject: RE: [Asrg] RE: 2.a.1 Analysis of Actual Spam Data - 
Titan Key re duces spam attacks

Ok everyone, its decision time.

================ Summary of this thread ===============

What we're trying to determine is: will "hard bounce" 
handling of spam (such as 550 no such user or 451greylisting  
response) reduce the amount of "spam attacks" to a given 
email address over time versus an email address that does not 
employ such tactics?

The question is: what research process can we execute such 
that the results can be deemed reasonably reliable to 
prove/disprove what we're trying to determine?

Our resident statistician, Terry Sullivan, basically says 
that the amount of noise on the data and the variances in 
spam make it difficult if not impossible to statistically 
prove the above unless the effect of the hard bounces is 
huge. And if the effect is so huge, why hasn't it already 
been figured out?

=============== The issues at hand =========================

1)    Do we continue to pursue this research topic?
2)    If so, what process can we agree on to follow?

==============  next steps ===========================

This is a call to action to everone one this list to address 
the above issues. Lets see what responses we get over a 
week's time. If we get either a bunch of "no" or nothing at 
all, we're close this issue and move on. If, on the other 
hand, we get some positive responses that INCLUDE IDEAS ON A 
RESEARCH PROCESS, we can then gather those ideas into a 
research plan and then move to execute.

======= misc, but important additional info ================

We've had at least 2 people lend some degree of assistance, 
Scott Nelson has offered to include researching the 550 on 
his experiments, and Damon Sauer has offered millions of 
emails worth of data. It's not entirely clear how either of 
these 2 would support the research process, but their 
offerings are at least certainly appreciated.  Yakov has also 
mentioned that we have access to data from Brightmail and Postini.

Paul Judge pointed to a few links out there related to what 
we're trying to research:

http://www.simplyquick.com/privacy.html#3
This is a report of what happened after subscribing to several
newsletters, performing normal actions on those newsletters (opening,
clicking, etc) and then unsubscribing.  Result: no spam, but several
newsletters did not respect the unsubscribe request.

http://www.infoworld.com/article/03/03/21/12ebsecret_1.html\
That talks about the results of another firm researching what HTML codes
do in spam mail AND WHAT EFFECT USING A 550 NO SUCH USER RESPONSE
(emphasis added) has on subsequent spam.

http://www.out-law.com/php/page.php?page_id=pressrele3360&area=about
The actual Web page of the firm conducting the research mentioned by the
infoworld article.


===========  YOUR  ACTION ITEM  ====================
(Yes, you. The subscriber to this.)
Duedate: on/before 8/21/03 12:01am GMT +0

Decide if issue #1 is worth pursuing and if so, think about a research
process (please make sure you KISS) and email that process in simple
bullet forms to this list.   Positive responses must include bullet
points on a research process in order to be considered positive.







_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg






_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg

RE: [Asrg] RE: 2.a.1 Analysis of Actual Spam Data - Titan Key re duces spam attacks