ietf-asrg
[Top] [All Lists]

Re: [Asrg] 3. Proof-of-work analysis

2004-05-18 17:44:19
We then
carefully worked through all the calculations, using the best data
that we could obtain -- and we did indeed come to the conclusion that
proof-of-work is not a viable proposal :(

That's a very interesting paper, thank you.  I wonder, however, what
the distribution curves are like when "regular correspondents" are
exempted from proof-of-work, not just mailing lists.  Would it be
possible to re-examine the MTA logs for this type of pattern?

in principle yes ... however I doubt that the systems at the top of the
curve (sending lots of email per day) would have regular correspondents.
Besides the people running mailing lists, they will be e-commerce
systems sending acknowledgements, hospitals confirming appointments, fax
delivery systems relaying incoming messages etc.

Let's think about the statistical significance a bit. The Net-wide average is about 3 people per host, but most hosts have only one or two people behind them. The discrepancy may well be made up for by a small number of hosts handling very large numbers of people.

As an example, Lancaster University provides a central UNIX-shell cluster for all students, among the services of which was the official e-mail system. Because it's a UNIX shell system, the mail is sent from the members of the cluster, not from the terminal used to log on. So, in theory, the Internet sees 10,000 students sharing three 4-way Sun workstations (this, at least, was what the configuration used to be).

In practice, about 60% of those students are off-campus and typically use third-party ISPs for personal correspondence anyway, and an increasingly large proportion of the remainder use personal computers from their rooms - but you can see the principle.

And yes, I agree that this particular use-case is fairly pessimal in terms of proof-of-work scenarios. However, intra-campus communications are typically quite well-ordered, so (with careful management around the beginning of the academic year) it could still be possible to use the same three workstations in a proof-of-work world.

By "regular correspondents" I mean people who know each other well
enough to send mail regularly, not necessarily frequently - even once a
week over a period of months.  I ask this because I expect that users
with slow machines - who would otherwise be the group most
inconvenienced by proof-of-work schemes - send mail that mostly falls
into this category.  I don't know, however, how much of the overall
picture is accounted for by these.

I don't see why one should expect any correlation between machine speed
and regularity of sending email. Many businesses will not splash out for admin staff machines, so it is they as well as aged parents who might be
expected to have old kit :)

I'm afraid I don't have much insight into how business e-mail patterns go. That's why I'm asking you for the statistics. :)

Point taken, anyway - chalk this one up as another use-case to be considered. It could be that the business might set up it's own proof-of-work server cluster for internal use, rather than upgrading individual workstations, or else rent time as needed on a third-party's cluster.

FWIW, I treat mailing lists as a special case of "regular correspondent", and as such I don't think it's necessary to distinguish them per se. You might like to consider this when compiling your statistics.

For future work, it might be instructive to identify various non-spam
use-cases which appear to have a high proof-of-work load - ie. on the
"long tail" of the distribution curves presented - and consider
practical ways of relieving or accommodating it.

indeed so ... though you should note that there is not much difference
between spam viability thresholds and the average case, let alone power-
users.

For the brute-force proof-of-work scheme you assume in the paper, this is undoubtedly true. I'm asking for more statistics to try and reveal whether the ways we've thought of, for making it less brute-force, are viable.

For proof-of-work to look plausible (and not a high-risk strategy) I'd
like to see factors of a thousand or more between plausible workloads
for legitimate senders and any economically viable spamming activity :-(

For my own usage pattern, assuming proof-of-work is exempted for regular correspondents, this is approximately true. I talk almost exclusively to mailing lists and people I know pretty well.

Occasionally I get a question from someone I don't know, but this is rare enough that I could, if necessary, give up 60 seconds of my PowerBook's CPU time to send a reply, without too much fuss - after all, it would have taken me at least that long to write it. I'd still be concerned about the time taken on a slower machine, though.

However, looking instead at the mail I *receive*, I can see a number of remote systems which could, potentially, be heavily burdened by proof-of-work. However, these aren't as common as you might think.

Forum update notifications? These come in frequently and from predictable sources, so I might as well whitelist them as a regular correspondent. The same goes for news and status mailings from my ISP and various other organisations.

E-commerce transaction confirmations? If it only costs a fractional cent per PoW token (because you're managing the hardware in bulk), it disappears next to the cost of the currency handling. Remember, I'm imagining that you can centralise the effort and effectively rent space on someone else's cluster for this, so even small shops see the same kinds of low cost. In practice, most e-commerce systems I've seen use a double-opt-in e-mail registration process, very similar to mailing lists, so similar mechanisms could apply.

As for tech support and sales enquiries, you're paying the staff sitting at the workstation at least national minimum wage (several dollars an hour), and they have a physical limit to how fast they can type and send mails. The cost of attaching tokens to those mails is miniscule in comparison. Whether it happens on the workstation or centrally is a matter of logistics - but if the workstation is already fast enough, the costs become essentially nil.

Registration confirmations? Seriously, these are *supposed* to be rare. I'd like to know what kind of system processes more registrations than actual service, before I consider this to be a problem.

I used to get e-cards from a few people. These are typically sent from the e-card vendor's system at present - a bad practice, but oh well. If the cost of generating proof-of-work is too high for the e-card vendor, they can get the sender to download the e-card and send it themselves. I'm not too worried about that.

That leaves one big category: Web Mail. The likes of Hotmail and Yahoo don't charge for sending e-mail from their systems, except perhaps in terms of banner ads. They also handle ginormous amounts of said mail, which could make a proof-of-work switch-on relatively difficult for them. However, most of their clients are low-end home users, who, on average, may have relatively favourable contact patterns. For this, we could do with more statistics.

Any more?

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi(_at_)chromatix(_dot_)demon(_dot_)co(_dot_)uk
website:  http://www.chromatix.uklinux.net/
tagline:  The key to knowledge is not to rely on people to teach you it.


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg