ietf-asrg
[Top] [All Lists]

RE: [ASRG] 2b. Public Trace Data

2003-10-06 12:17:03

There were some discussions about creating a reviewer process for cleaning
the data further.

I do question the 15% value. Mainly because I question the definition of
spam that you used. For example, joke forward and virus mail seem to qualify
as unwanted email. Also depending on the mailing lists that you are speaking
of, they could qualify as unwanted email.

I do not believe that the data is 100% spam because of the subjectiveness of
the volunteers that are submitting data. I would just like to see more
details about how we come to the number of how much isn't spam.

Do you have any ideas on how to further sanitize the data? Right now, data
is sent to either submit(_at_)spamarchive or 
submitautomated(_at_)spamarchive(_dot_) Submit@
is supposed to be human reviewed messages while submitautomated is stuff
forwarded from anti-spam tools.

The two approaches that we discussed were a) re-run the messages through
anti-spam tools to confirm they are spam and b) set up an interface so that
volunteers can review and vote on messages. With a decent number of
volunteers, one could get through a fair amount of messages daily. We did an
initial poll for volunteers early on and there were some individuals that
expressed interest. We need someone to create the interface that allows
review.

-----Original Message-----
From: Andreas Saurwein [mailto:saurwein(_at_)uniwares(_dot_)com] 
Sent: Monday, October 06, 2003 12:04 PM
To: asrg(_at_)ietf(_dot_)org
Subject: [ASRG] 2b. Public Trace Data


Running some analysis on the available data on 
www.spamarchive.org showed 
that about 15% of the data there is actually not spam but the 
usual mailing 
lists, joke forwards, virus mail, etc.

Although the remaining 85% are somewhat valuable, it is 
difficult to use 
automated tools on this data.

Are there any efforts (or intentions) to eventually clean this out?

cheers
Andreas


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg



<Prev in Thread] Current Thread [Next in Thread>