ietf-asrg
[Top] [All Lists]

Re: [Asrg] Collecting statistics

2003-03-07 14:28:33
From: "David F. Skoll" <dfs(_at_)roaringpenguin(_dot_)com>

...
Consider the dangers of being able to
ask whether the system has seen a message with a sender of the hash of
"Bill Gates" and at recipeint of the hash of "Steve Jobs" today.

You probably only want hashes of IP addresses, not e-mail addresses.
If the DCC collected not only message hashes, but also the number of
different IPs from which those messages originated, I bet we'd see
some interesting data.

Please think how you would utilize such a system if you were a bad
guy or just trolling for interesting information for buying and selling
stock.  Traffic analysis can be applied to more than just names.

The DCC can also collect hashes of IP addresses.  At Paul Vixie's
suggestion, early versions could answer questions like "how many messages
with this body checksum had this source IP address?" To mitigate the
obvious privacy worries even in the original business model where all
DCC servers would be run by a reasonable trustworty outfit as well as
to deal with performance issues, the results were imprecise.  I removed
those mechanisms as the model changed and to gain performance.

Please also consider the amount of data you are talking about.  You
will probably collect 500 or 1000 bytes per mail message.  (A dozen
16- byte checksums, pointers, counts, padding, etc.)  If you sample
1% of mail, and accept my guess that's 100 M msgs/day, you're talking
about collecting and reducing 100 GBytes of data/day.  You'd need to
repeat your measurements every day for a week, because spam varies
during the data and during the week.  You'd also want to repeat it
every few weeks to catch long term changes.  Arithmetic gives numbers
that would need a serious source funds.


Vernon Schryver    vjs(_at_)rhyolite(_dot_)com
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg



<Prev in Thread] Current Thread [Next in Thread>