Re: [Asrg] 3. Proof-of-work analysis
2004-05-18 17:44:19
We then
carefully worked through all the calculations, using the best data
that we could obtain -- and we did indeed come to the conclusion that
proof-of-work is not a viable proposal :(
That's a very interesting paper, thank you. I wonder, however, what
the distribution curves are like when "regular correspondents" are
exempted from proof-of-work, not just mailing lists. Would it be
possible to re-examine the MTA logs for this type of pattern?
in principle yes ... however I doubt that the systems at the top of the
curve (sending lots of email per day) would have regular
correspondents.
Besides the people running mailing lists, they will be e-commerce
systems sending acknowledgements, hospitals confirming appointments,
fax
delivery systems relaying incoming messages etc.
Let's think about the statistical significance a bit. The Net-wide
average is about 3 people per host, but most hosts have only one or two
people behind them. The discrepancy may well be made up for by a small
number of hosts handling very large numbers of people.
As an example, Lancaster University provides a central UNIX-shell
cluster for all students, among the services of which was the official
e-mail system. Because it's a UNIX shell system, the mail is sent from
the members of the cluster, not from the terminal used to log on. So,
in theory, the Internet sees 10,000 students sharing three 4-way Sun
workstations (this, at least, was what the configuration used to be).
In practice, about 60% of those students are off-campus and typically
use third-party ISPs for personal correspondence anyway, and an
increasingly large proportion of the remainder use personal computers
from their rooms - but you can see the principle.
And yes, I agree that this particular use-case is fairly pessimal in
terms of proof-of-work scenarios. However, intra-campus communications
are typically quite well-ordered, so (with careful management around
the beginning of the academic year) it could still be possible to use
the same three workstations in a proof-of-work world.
By "regular correspondents" I mean people who know each other well
enough to send mail regularly, not necessarily frequently - even once
a
week over a period of months. I ask this because I expect that users
with slow machines - who would otherwise be the group most
inconvenienced by proof-of-work schemes - send mail that mostly falls
into this category. I don't know, however, how much of the overall
picture is accounted for by these.
I don't see why one should expect any correlation between machine speed
and regularity of sending email. Many businesses will not splash out
for
admin staff machines, so it is they as well as aged parents who might
be
expected to have old kit :)
I'm afraid I don't have much insight into how business e-mail patterns
go. That's why I'm asking you for the statistics. :)
Point taken, anyway - chalk this one up as another use-case to be
considered. It could be that the business might set up it's own
proof-of-work server cluster for internal use, rather than upgrading
individual workstations, or else rent time as needed on a third-party's
cluster.
FWIW, I treat mailing lists as a special case of "regular
correspondent", and as such I don't think it's necessary to distinguish
them per se. You might like to consider this when compiling your
statistics.
For future work, it might be instructive to identify various non-spam
use-cases which appear to have a high proof-of-work load - ie. on the
"long tail" of the distribution curves presented - and consider
practical ways of relieving or accommodating it.
indeed so ... though you should note that there is not much difference
between spam viability thresholds and the average case, let alone
power-
users.
For the brute-force proof-of-work scheme you assume in the paper, this
is undoubtedly true. I'm asking for more statistics to try and reveal
whether the ways we've thought of, for making it less brute-force, are
viable.
For proof-of-work to look plausible (and not a high-risk strategy) I'd
like to see factors of a thousand or more between plausible workloads
for legitimate senders and any economically viable spamming activity
:-(
For my own usage pattern, assuming proof-of-work is exempted for
regular correspondents, this is approximately true. I talk almost
exclusively to mailing lists and people I know pretty well.
Occasionally I get a question from someone I don't know, but this is
rare enough that I could, if necessary, give up 60 seconds of my
PowerBook's CPU time to send a reply, without too much fuss - after
all, it would have taken me at least that long to write it. I'd still
be concerned about the time taken on a slower machine, though.
However, looking instead at the mail I *receive*, I can see a number of
remote systems which could, potentially, be heavily burdened by
proof-of-work. However, these aren't as common as you might think.
Forum update notifications? These come in frequently and from
predictable sources, so I might as well whitelist them as a regular
correspondent. The same goes for news and status mailings from my ISP
and various other organisations.
E-commerce transaction confirmations? If it only costs a fractional
cent per PoW token (because you're managing the hardware in bulk), it
disappears next to the cost of the currency handling. Remember, I'm
imagining that you can centralise the effort and effectively rent space
on someone else's cluster for this, so even small shops see the same
kinds of low cost. In practice, most e-commerce systems I've seen use
a double-opt-in e-mail registration process, very similar to mailing
lists, so similar mechanisms could apply.
As for tech support and sales enquiries, you're paying the staff
sitting at the workstation at least national minimum wage (several
dollars an hour), and they have a physical limit to how fast they can
type and send mails. The cost of attaching tokens to those mails is
miniscule in comparison. Whether it happens on the workstation or
centrally is a matter of logistics - but if the workstation is already
fast enough, the costs become essentially nil.
Registration confirmations? Seriously, these are *supposed* to be
rare. I'd like to know what kind of system processes more
registrations than actual service, before I consider this to be a
problem.
I used to get e-cards from a few people. These are typically sent from
the e-card vendor's system at present - a bad practice, but oh well.
If the cost of generating proof-of-work is too high for the e-card
vendor, they can get the sender to download the e-card and send it
themselves. I'm not too worried about that.
That leaves one big category: Web Mail. The likes of Hotmail and
Yahoo don't charge for sending e-mail from their systems, except
perhaps in terms of banner ads. They also handle ginormous amounts of
said mail, which could make a proof-of-work switch-on relatively
difficult for them. However, most of their clients are low-end home
users, who, on average, may have relatively favourable contact
patterns. For this, we could do with more statistics.
Any more?
--------------------------------------------------------------
from: Jonathan "Chromatix" Morton
mail: chromi(_at_)chromatix(_dot_)demon(_dot_)co(_dot_)uk
website: http://www.chromatix.uklinux.net/
tagline: The key to knowledge is not to rely on people to teach you it.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg
|
|