ietf-asrg
[Top] [All Lists]

Re: [Asrg] About that e-postage draft [POSTAGE]

2009-02-21 19:07:32
Steve Atkins wrote, On 2/20/09 7:26 PM:

On Feb 20, 2009, at 3:25 PM, Bill Cole wrote:
[Quoting John Leslie]
  The bottom line is, redeeming a million tokens per second is practical
with processing delay not much greater than network latency. (This was
not true ten years ago...)

I think I'd quibble with details on that, but they really are not all that
important.

I guess maybe one detail is...

If the processing delay for every redemption attempt is of the same order of magnitude as irreducible network latencies, i.e. tens to hundreds of milliseconds, handling a million one-use token redemption attempts per second is absolutely hopeless. I'm pretty sure that a large fraction of the attempts would be for already-redeemed tokens and that those could be handled extremely fast so that a validation server can handle such requests in sub-millisecond times. I would hope that the requests which succeed could be done in sub-millisecond time as well, but that's less critical.

This is a problem that should be subject to a simplified thought experiment. If you make extremely unrealistic positive assumptions about processing, storage, and bandwidth but recognize that many RTT's between redemption clients and servers will be above 100ms, most above 10ms, and essentially all above 1ms, it is very hard to design a logical system (never mind a collection of hardware) that will handle a million redemption requests per second in a worthwhile manner (i.e. not repudiate valid tokens or validate bogus or redeemed ones by design) when the request stream is being engineered to break the system by parties with tens of thousands of hijacked machines at their disposal.

There are many problems with epostage - fundamentals, social problems and implementation - but this particular aspect isn't one, I don't think. It's a trivially parallelizable problem.

If it's trivial, I'd welcome the public humiliation of being shown how, with specific reference to the problem that (for example) 50k clients on the other end of 100ms RTT connections may present the same token almost simultaneously, and exactly one can be be allowed to redeem it.

The reliability (SPoF), economics, business and usability issues are likely to be much more of a problem.

Those are all problems. It seems to me that any attempt to seriously address the SPoF problem makes the race resolution problem harder.

I'm pretty sure that I'm not the best systems analyst/designer on this list. I certainly hope I'm not the best one to have thought about e-postage. I'd be happy to learn from a master how it is in fact possible to make an ideally simplified minimal system like this work as a starting point for how to assemble a more complex system that has more elements of reality in it. I think (but may be wrong!) that it isn't possible to design a system that will be theoretically capable of correctly handling a million redemption requests per second of which ~90% are the result of someone working to break the system.


It's fun to consider, though.

Using your numbers - one million redemption requests a second of which at least 90% are invalid, leads to 100,000 outstanding valid requests per second, which would give around 250 billion outstanding stamps at any one time, if we expire them after a month (expiring would likely involve voiding the unused stamps and issuing new ones in the same amount, but that's a business issue, and doesn't affect the redemption requirements).

Use private-key cryptography to ensure that you reject any stamp you didn't create. This is easy to do in hardware, or to parallelize should you need to. Pick enough machines or asics to meet your throughput goal. (That might well be one).

Hash the stamp to one of a number of redemption machines. This is "just" a network problem.

I love the quotes...

That network problem risks adding latency to every transaction, particularly because at first glance it looks like an obvious place to address the large-scale SPoF problem by giving physical location diversity to your subunits. Unfortunately, if you make the back end really robust by putting redundant parts in widely dispersed places, every transaction gets a multi-millisecond minimum lifetime, which is a problem.

At each redemption machine, look up the stamp an "I've seen this" associative array. If you've seen it, reject the stamp, otherwise accept it. This is arbitrarily scalable, just by adding enough redemption machines that the memory access time to look up the entry in the associative array is enough to meet your throughput goal, and the size of the number of outstanding stamps fits in the storage space of the machine. Assuming there's a serial number in each stamp, your associative array could simply be 250 gigabits of RAM, so again it's not going to be many machines, maybe one, to do in software.

I think it is a bit of a hand-wave to call this arbitrarily scalable, but I'm happy to stipulate for the test of my hypothesis that the entire server-side decision process for any one redemption transaction can be reliably done correctly in uniform sub-millisecond time if the stamp is either available for redemption or already redeemed. The problem is that logically you need a third intermediate state that will last for the RTT of the network connection to the redeeming client, and that state will defer the decision for other attempts to redeem the stamp.

To me it feels like the hard bit of this is handling a million packets in and out per second reliably, along with the overhead of providing robustness and redundancy, rather than the redemption itself.

That was my point, because it seems to me that a redemption cannot be done with just one packet in and one out, but really needs two in and one out. A legitimate stamp needs to have 3 possible states in the server's map: redeemed, unredeemed, and pending acknowledgment of redemption. If the server only has two states for a stamp, then it would end up with one of two flaws by design:

1. If the stamp is marked as redeemed when a successful redemption attempt completes on the server, it is possible that the success will not be successfully communicated to the client. If the client then retries the redemption, it will fail.

2. If the stamp is left as unredeemed while waiting for the client ack of success, stamp "reuse" becomes a question of how many redemption decisions can be made per client RTT.

The server may have to defer many thousands of clients for scores of milliseconds while waiting for one to send an ack. Handling a few dozen such events per second seems to me to be a really hard problem to address, but maybe I'm missing something. If the average transaction lifetime is 100ms, then a million-TPS system needs to be able to handle an average of 100k concurrent pending transactions and spikes probably twice that. I think the ways to handle that all include dividing the front end between multiple machines, but that creates a tougher problem keeping the back end recordkeeping fast and coherent from the viewpoints all of the front ends.

I suspect that the "solution" that will be chosen if anyone tries to create a real e-postage system instead of hand-waving about it will be to open it to lost packet damage as the cost of scalability. Network latency with clients becomes irrelevant if the server assumes that its redemption messages are always delivered. That allows for a lot of optimization. Once in a while a stamp that should work will fail to do so, and if such a system ever gets into the real world I'm sure its users will be shocked at how much higher the real-world failures are than in their tests...


_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg