Re: [Asrg] 3. Proof-of-work analysis

We then
carefully worked through all the calculations, using the best data
that we could obtain -- and we did indeed come to the conclusion that
proof-of-work is not a viable proposal :(


That's a very interesting paper, thank you.  I wonder, however, what
the distribution curves are like when "regular correspondents" are
exempted from proof-of-work, not just mailing lists.  Would it be
possible to re-examine the MTA logs for this type of pattern?


in principle yes ... however I doubt that the systems at the top of the

curve (sending lots of email per day) would have regularcorrespondents.

Besides the people running mailing lists, they will be e-commerce

systems sending acknowledgements, hospitals confirming appointments,fax

delivery systems relaying incoming messages etc.

Let's think about the statistical significance a bit. The Net-wideaverage is about 3 people per host, but most hosts have only one or twopeople behind them. The discrepancy may well be made up for by a smallnumber of hosts handling very large numbers of people.

As an example, Lancaster University provides a central UNIX-shellcluster for all students, among the services of which was the officiale-mail system. Because it's a UNIX shell system, the mail is sent fromthe members of the cluster, not from the terminal used to log on. So,in theory, the Internet sees 10,000 students sharing three 4-way Sunworkstations (this, at least, was what the configuration used to be).

In practice, about 60% of those students are off-campus and typicallyuse third-party ISPs for personal correspondence anyway, and anincreasingly large proportion of the remainder use personal computersfrom their rooms - but you can see the principle.

And yes, I agree that this particular use-case is fairly pessimal interms of proof-of-work scenarios. However, intra-campus communicationsare typically quite well-ordered, so (with careful management aroundthe beginning of the academic year) it could still be possible to usethe same three workstations in a proof-of-work world.

By "regular correspondents" I mean people who know each other well
enough to send mail regularly, not necessarily frequently - even oncea
week over a period of months.  I ask this because I expect that users
with slow machines - who would otherwise be the group most
inconvenienced by proof-of-work schemes - send mail that mostly falls
into this category.  I don't know, however, how much of the overall
picture is accounted for by these.
I don't see why one should expect any correlation between machine speed
and regularity of sending email. Many businesses will not splash outforadmin staff machines, so it is they as well as aged parents who mightbe
expected to have old kit :)

I'm afraid I don't have much insight into how business e-mail patternsgo. That's why I'm asking you for the statistics. :)

Point taken, anyway - chalk this one up as another use-case to beconsidered. It could be that the business might set up it's ownproof-of-work server cluster for internal use, rather than upgradingindividual workstations, or else rent time as needed on a third-party'scluster.

FWIW, I treat mailing lists as a special case of "regularcorrespondent", and as such I don't think it's necessary to distinguishthem per se. You might like to consider this when compiling yourstatistics.

For future work, it might be instructive to identify various non-spam
use-cases which appear to have a high proof-of-work load - ie. on the
"long tail" of the distribution curves presented - and consider
practical ways of relieving or accommodating it.


indeed so ... though you should note that there is not much difference

between spam viability thresholds and the average case, let alonepower-

users.

For the brute-force proof-of-work scheme you assume in the paper, thisis undoubtedly true. I'm asking for more statistics to try and revealwhether the ways we've thought of, for making it less brute-force, areviable.

For proof-of-work to look plausible (and not a high-risk strategy) I'd
like to see factors of a thousand or more between plausible workloads
for legitimate senders and any economically viable spamming activity:-(

For my own usage pattern, assuming proof-of-work is exempted forregular correspondents, this is approximately true. I talk almostexclusively to mailing lists and people I know pretty well.

Occasionally I get a question from someone I don't know, but this israre enough that I could, if necessary, give up 60 seconds of myPowerBook's CPU time to send a reply, without too much fuss - afterall, it would have taken me at least that long to write it. I'd stillbe concerned about the time taken on a slower machine, though.

However, looking instead at the mail I *receive*, I can see a number ofremote systems which could, potentially, be heavily burdened byproof-of-work. However, these aren't as common as you might think.

Forum update notifications? These come in frequently and frompredictable sources, so I might as well whitelist them as a regularcorrespondent. The same goes for news and status mailings from my ISPand various other organisations.

E-commerce transaction confirmations? If it only costs a fractionalcent per PoW token (because you're managing the hardware in bulk), itdisappears next to the cost of the currency handling. Remember, I'mimagining that you can centralise the effort and effectively rent spaceon someone else's cluster for this, so even small shops see the samekinds of low cost. In practice, most e-commerce systems I've seen usea double-opt-in e-mail registration process, very similar to mailinglists, so similar mechanisms could apply.

As for tech support and sales enquiries, you're paying the staffsitting at the workstation at least national minimum wage (severaldollars an hour), and they have a physical limit to how fast they cantype and send mails. The cost of attaching tokens to those mails isminiscule in comparison. Whether it happens on the workstation orcentrally is a matter of logistics - but if the workstation is alreadyfast enough, the costs become essentially nil.

Registration confirmations? Seriously, these are *supposed* to berare. I'd like to know what kind of system processes moreregistrations than actual service, before I consider this to be aproblem.

I used to get e-cards from a few people. These are typically sent fromthe e-card vendor's system at present - a bad practice, but oh well.If the cost of generating proof-of-work is too high for the e-cardvendor, they can get the sender to download the e-card and send itthemselves. I'm not too worried about that.

That leaves one big category: Web Mail. The likes of Hotmail andYahoo don't charge for sending e-mail from their systems, exceptperhaps in terms of banner ads. They also handle ginormous amounts ofsaid mail, which could make a proof-of-work switch-on relativelydifficult for them. However, most of their clients are low-end homeusers, who, on average, may have relatively favourable contactpatterns. For this, we could do with more statistics.


Any more?

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi(_at_)chromatix(_dot_)demon(_dot_)co(_dot_)uk
website:  http://www.chromatix.uklinux.net/
tagline:  The key to knowledge is not to rely on people to teach you it.


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg