ietf-asrg
[Top] [All Lists]

Varieties of spam (was RE: [Asrg] ASRG work items)

2003-03-09 17:44:31
This is a good point. More quantitative rather than anecdotal data would be
useful. We started spamarchive.org a few months ago to provide such a
standard and open spam corpus. (The latest archives are not online right now
as we are changing hosting facilities, but they are available for those that
would like to use them.) Another missing piece is a set of tools for
anonymizing, measuring, and analyzing spam data. I mention this and give
some examples in my talk at the spam conference
(http://www.spamconference.org/proceedings2003.html).

Another relevant thread is one started by Kee Hinckley called "Re: [Asrg]
Back to the charter". It aims to categorize spam by the technique used to
send it.

-----Original Message-----
From: Fred Bacon [mailto:bacon(_at_)aerodyne(_dot_)com] 
Sent: Sunday, March 09, 2003 6:59 PM
To: Paul Judge
Cc: 'Asrg (asrg(_at_)ietf(_dot_)org)'
Subject: Re: [Asrg] ASRG work items


Please forgive me if I restate what has already been 
discussed.  I have spent the afternoon going through the mail 
archive, but I could not possibly read every message.

On Sun, 2003-03-09 at 14:38, Paul Judge wrote:

Milestones/Deliverables:
1. problem statement/ requirements document

Keith Moore and Balachander Krishnamurthy have started a good thread
on
"requirements for a proposed solution + notion of consent" (also
called
"evaluating proposals against requirements").


I would like to comment on the first milestone.  It seems to 
me that there is considerable disagreement on even so simple 
a matter as the amount of spam with forged addresses.  I 
believe that one of the first items which should be addressed 
is a quantitative assessment of the methods and varieties of 
spam.  Part of this would be a standard spam corpus against 
which filters could be tested.  But there should be other 
quantitative activities as well.  The spam messages in the 
corpus are only a part of the data.  Log file entries related 
to those messages should also be recorded and maintained for 
analysis.  For instance, what percentage of spam messages 
really do come from open relays in this day and age?  Can 
anyone say for certain?  Where is the data to determine this?

I suggest that an early goal for ASRG should be to develop 
and distribute a standard set of spam collection and analysis 
tools.  These tools should be suitable for instrumenting 
servers (either production mail servers or honeypots) and 
building an extensive database of spam for analysis.  No 
source of information should be ignored.  Everything should 
be recorded, the message, the server logs and all related TCP packets.

Of course, great care would need to be taken to protect the 
privacy of the spam recipients.  Honeypots may be the only 
viable method for this level of data collection.  In fact, I 
would recommend a network of honepots in different TLDs and 
geographic locations.

I hope this suggestion is useful.


Fred Bacon
Senior Scientist
Aerodyne Research, Inc.
Billerica, MA 01821


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg



<Prev in Thread] Current Thread [Next in Thread>