Re: [Asrg] About that e-postage draft [POSTAGE]

Steve Atkins wrote, On 2/20/09 7:26 PM:


On Feb 20, 2009, at 3:25 PM, Bill Cole wrote:

[Quoting John Leslie]

  The bottom line is, redeeming a million tokens per second is practical
with processing delay not much greater than network latency. (This was
not true ten years ago...)

I think I'd quibble with details on that, but they really are not allthat

important.


I guess maybe one detail is...

If the processing delay for every redemption attempt is of the same order ofmagnitude as irreducible network latencies, i.e. tens to hundreds ofmilliseconds, handling a million one-use token redemption attempts persecond is absolutely hopeless. I'm pretty sure that a large fraction of theattempts would be for already-redeemed tokens and that those could behandled extremely fast so that a validation server can handle such requestsin sub-millisecond times. I would hope that the requests which succeedcould be done in sub-millisecond time as well, but that's less critical.

This is a problem that should be subject to a simplified thoughtexperiment. If you make extremely unrealistic positive assumptionsabout processing, storage, and bandwidth but recognize that many RTT'sbetween redemption clients and servers will be above 100ms, most above10ms, and essentially all above 1ms, it is very hard to design alogical system (never mind a collection of hardware) that will handlea million redemption requests per second in a worthwhile manner (i.e.not repudiate valid tokens or validate bogus or redeemed ones bydesign) when the request stream is being engineered to break thesystem by parties with tens of thousands of hijacked machines at theirdisposal.
There are many problems with epostage - fundamentals, social problemsand implementation - but this particular aspect isn't one, I don'tthink. It's a trivially parallelizable problem.

If it's trivial, I'd welcome the public humiliation of being shown how, withspecific reference to the problem that (for example) 50k clients on theother end of 100ms RTT connections may present the same token almostsimultaneously, and exactly one can be be allowed to redeem it.

The reliability (SPoF), economics, business and usability issues arelikely to be much more of a problem.

Those are all problems. It seems to me that any attempt to seriously addressthe SPoF problem makes the race resolution problem harder.

I'm pretty sure that I'm not the best systems analyst/designer on thislist.I certainly hope I'm not the best one to have thought about e-postage.I'dbe happy to learn from a master how it is in fact possible to make anideally simplified minimal system like this work as a starting pointfor how to assemble a more complex system that has more elements ofreality in it. I think (but may be wrong!) that it isn't possible todesign a system that will be theoretically capable of correctlyhandling a million redemption requests per second of which ~90% arethe result of someone working to break the system.
It's fun to consider, though.
Using your numbers - one million redemption requests a second of whichat least 90% are invalid, leads to 100,000 outstanding valid requestsper second, which would give around 250 billion outstanding stamps atany one time, if we expire them after a month (expiring would likelyinvolve voiding the unused stamps and issuing new ones in the sameamount, but that's a business issue, and doesn't affect the redemptionrequirements).
Use private-key cryptography to ensure that you reject any stamp youdidn't create. This is easy to do in hardware, or to parallelize shouldyou need to. Pick enough machines or asics to meet your throughput goal.(That might well be one).
Hash the stamp to one of a number of redemption machines. This is "just"a network problem.


I love the quotes...

That network problem risks adding latency to every transaction, particularlybecause at first glance it looks like an obvious place to address thelarge-scale SPoF problem by giving physical location diversity to yoursubunits. Unfortunately, if you make the back end really robust by puttingredundant parts in widely dispersed places, every transaction gets amulti-millisecond minimum lifetime, which is a problem.

At each redemption machine, look up the stamp an "I've seen this"associative array. If you've seen it, reject the stamp, otherwise acceptit. This is arbitrarily scalable, just by adding enough redemptionmachines that the memory access time to look up the entry in theassociative array is enough to meet your throughput goal, and the sizeof the number of outstanding stamps fits in the storage space of themachine. Assuming there's a serial number in each stamp, yourassociative array could simply be 250 gigabits of RAM, so again it's notgoing to be many machines, maybe one, to do in software.

I think it is a bit of a hand-wave to call this arbitrarily scalable, butI'm happy to stipulate for the test of my hypothesis that the entireserver-side decision process for any one redemption transaction can bereliably done correctly in uniform sub-millisecond time if the stamp iseither available for redemption or already redeemed. The problem is thatlogically you need a third intermediate state that will last for the RTT ofthe network connection to the redeeming client, and that state will deferthe decision for other attempts to redeem the stamp.

To me it feels like the hard bit of this is handling a million packetsin and out per second reliably, along with the overhead of providingrobustness and redundancy, rather than the redemption itself.

That was my point, because it seems to me that a redemption cannot be donewith just one packet in and one out, but really needs two in and one out.A legitimate stamp needs to have 3 possible states in the server's map:redeemed, unredeemed, and pending acknowledgment of redemption. If theserver only has two states for a stamp, then it would end up with one of twoflaws by design:

1. If the stamp is marked as redeemed when a successful redemption attemptcompletes on the server, it is possible that the success will not besuccessfully communicated to the client. If the client then retries theredemption, it will fail.

2. If the stamp is left as unredeemed while waiting for the client ack ofsuccess, stamp "reuse" becomes a question of how many redemption decisionscan be made per client RTT.

The server may have to defer many thousands of clients for scores ofmilliseconds while waiting for one to send an ack. Handling a few dozen suchevents per second seems to me to be a really hard problem to address, butmaybe I'm missing something. If the average transaction lifetime is 100ms,then a million-TPS system needs to be able to handle an average of 100kconcurrent pending transactions and spikes probably twice that. I think theways to handle that all include dividing the front end between multiplemachines, but that creates a tougher problem keeping the back endrecordkeeping fast and coherent from the viewpoints all of the front ends.

I suspect that the "solution" that will be chosen if anyone tries to createa real e-postage system instead of hand-waving about it will be to open itto lost packet damage as the cost of scalability. Network latency withclients becomes irrelevant if the server assumes that its redemptionmessages are always delivered. That allows for a lot of optimization. Oncein a while a stamp that should work will fail to do so, and if such a systemever gets into the real world I'm sure its users will be shocked at how muchhigher the real-world failures are than in their tests...



_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg