RE: [Asrg] RE: 2.a.1 Analysis of Actual Spam Data

I'm not convinced that the noise factor is so great that we can't get any
useful information out of analysis.  What I am sure of is that if that
argument wins this time and this line of enquiry is dropped as a result,
then most (maybe all) other lines of enquiry will go the same way.  Until we
have some measurements (quite a lot of measurements) we don't really know
how big the noise factor is.  Your original less than 100 days of data
looked pretty noisy but all sorts of techniques are available for smoothing
away noise; until there's enough dta, we don't know whether those techniques
will prevail in the current case.

An experimental approach would be to set up a number of email accounts and
give them different degrees of exposure (some with repeated on-going
exposure, some with initial exposure but no repeated exposure, some with
exposure through "respectable" mailing lists only and some with exposure
through "non-respectable" lists, some with newsgroup exposure (again
"respectable" and "non-respectable"), and so on covering quite a lot of
variations in exposure and a good mix of combinations of exposure.  Then
tyere's a need for a large number of email accounts with identical exposure;
and each variant needs to occur (many times) in many different email domains
(different ISPs, different countries, ...).  Then some proportion of each
similar block of acounts needs to start using 550 rejections straight away,
some proportion needs to start using them after a few weeks, some proportion
needs to never use them.  After enough time (probably a year or so) there
will be enough data to draw conclusions about the noise level from, and
perhaps even to draw conclusions about the effectiveness of the 550
technique.


Tom

-----Original Message-----
From: asrg-admin(_at_)ietf(_dot_)org [mailto:asrg-admin(_at_)ietf(_dot_)org]On 
Behalf Of Peter
Kay
Sent: 13 August 2003 17:07
To: asrg(_at_)ietf(_dot_)org
Subject: RE: [Asrg] RE: 2.a.1 Analysis of Actual Spam Data - Titan Key
re duces spam attacks


Ok everyone, its decision time.

================ Summary of this thread ===============

What we're trying to determine is: will "hard bounce" handling of spam
(such as 550 no such user or 451greylisting  response) reduce the amount
of "spam attacks" to a given email address over time versus an email
address that does not employ such tactics?

The question is: what research process can we execute such that the
results can be deemed reasonably reliable to prove/disprove what we're
trying to determine?

Our resident statistician, Terry Sullivan, basically says that the
amount of noise on the data and the variances in spam make it difficult
if not impossible to statistically prove the above unless the effect of
the hard bounces is huge. And if the effect is so huge, why hasn't it
already been figured out?


=============== The issues at hand =========================

1)      Do we continue to pursue this research topic?
2)      If so, what process can we agree on to follow?


==============  next steps ===========================

This is a call to action to everone one this list to address the above
issues. Lets see what responses we get over a week's time. If we get
either a bunch of "no" or nothing at all, we're close this issue and
move on. If, on the other hand, we get some positive responses that
INCLUDE IDEAS ON A RESEARCH PROCESS, we can then gather those ideas into
a research plan and then move to execute.

======= misc, but important additional info ================

We've had at least 2 people lend some degree of assistance, Scott Nelson
has offered to include researching the 550 on his experiments, and Damon
Sauer has offered millions of emails worth of data. It's not entirely
clear how either of these 2 would support the research process, but
their offerings are at least certainly appreciated.  Yakov has also
mentioned that we have access to data from Brightmail and Postini.

Paul Judge pointed to a few links out there related to what we're trying
to research:

http://www.simplyquick.com/privacy.html#3
This is a report of what happened after subscribing to several
newsletters, performing normal actions on those newsletters (opening,
clicking, etc) and then unsubscribing.  Result: no spam, but several
newsletters did not respect the unsubscribe request.

http://www.infoworld.com/article/03/03/21/12ebsecret_1.html\
That talks about the results of another firm researching what HTML codes
do in spam mail AND WHAT EFFECT USING A 550 NO SUCH USER RESPONSE
(emphasis added) has on subsequent spam.

http://www.out-law.com/php/page.php?page_id=pressrele3360&area=about
The actual Web page of the firm conducting the research mentioned by the
infoworld article.


===========  YOUR  ACTION ITEM  ====================
(Yes, you. The subscriber to this.)
Duedate: on/before 8/21/03 12:01am GMT +0

Decide if issue #1 is worth pursuing and if so, think about a research
process (please make sure you KISS) and email that process in simple
bullet forms to this list.   Positive responses must include bullet
points on a research process in order to be considered positive.







_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg
RE: [Asrg] RE: 2.a.1 Analysis of Actual Spam Data - Titan Key re duces spam attacks