A purely speculative thought about Liam's (admittedly preliminary)
results regarding the dramatic differences between newsgroups/Web
pages, honeypot addresses and spam traffic...
If I wanted to collect as many email addresses as I possibly could,
as quickly as I could, I think Usenet is exactly where I'd look
first, because "every" message has a (not necessarily legitimate)
email address associated with it. So Usenet postings constitute a
1:1 source of "potential" addresses. In contrast, a quick (and
totally unverified) google search leads to a (perhaps totally
inaccurate) guess of maybe 1:220 web pages containing mailto: links.
(Again, this is a total SWAG, offered only as an order-of-magnitude
guess.)
Now, I know of no data for what percentage of newsgroup postings (or
Web pages, for that matter) do something to obfuscate the email
address being presented. But it would take a ~2-orders-of-magnitude
difference in obfuscation rates to make Usenet the lower-payback
venue for "scraping." Then too, many of the "techniques" used on
Usenet are the obfuscation-equivalent of a simple keyword filter--a
quick global-replace of " AT " with "@" and " DOT " with "." would
probably pay back in a big way.
(I'm posting this speculation to the list because I want to believe
that any ASRG project involving honeypot addresses would want the
quickest payback in terms of "seeding" activity.)
- Terry
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg