ietf-asrg
[Top] [All Lists]

Re: [Asrg] Re: On the need for reliable, objective data

2003-03-07 13:10:59
From: "David F. Skoll" <dfs(_at_)roaringpenguin(_dot_)com>

...
How should such data be managed, stored, reviewed,
made available to the public, etc?

I think a mechanism similar to DCC could be good.  We need a way for
lots of sensors to dump information into the collection network.  We
need a way to decide what summaries of the data are useful.  And we need
a way to extract the data without compromising privacy.  (e.g., report
SHA1 hashes of addresses rather than addresses themselves.)
...

I think I'm qualified to comment on that idea.  The short version of
my take is "great in theory but nearly hopeless in practice."

The hopeless part is that in practice it is extremely difficult to
build a network large enough to collect enough data to be other than
a muddy pile of annecdotes.  The DCC is more than 2 years old, but it
still sees at most single-digit percentages of all mail in the network
and perhaps less.  (I suspect there are more than 1 but probably fewer
than 10 Billion mail messages/day today.  The DCC sees perhaps 20
M/day.) The DCC has been deployed by only some classes of outfits.
For example, it has not be and probably will not be used by the largest
organizations that prefer to roll their own solutions.

Even in the skewed population that does use the DCC, there is evidence
that making generalizations is very hard.  There is 3X difference in
spam load per user depending on organization type judging from
http://www.dcc-servers.net/dcc/graphs/comp-rates

There are reasons of self-interest for outfits to install the DCC,
but few or at best weak reasons for installing monitoring software
that would report to outsiders.  There are strong reasons for many
outfits to flatly refuse to install such software; that's why the DCC
is designed to not collect a lot of the information that one would want.

DCC clients can report and DCC servers can collect hashes of addresses,
but that's dangerous unless (or even if) you think carefully about what
the servers will do with the data.  Consider the dangers of being able to
ask whether the system has seen a message with a sender of the hash of
"Bill Gates" and at recipeint of the hash of "Steve Jobs" today.

(By default, DCC servers do not collect anything but body hashes for
several reasons, starting with the limited usefulness of those hashes.)


Vernon Schryver    vjs(_at_)rhyolite(_dot_)com
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg



<Prev in Thread] Current Thread [Next in Thread>