Steve,
I believe you're getting the point I was trying to make.
The problem with 1-loop operation is that you can't start a
subsequent operation until you know how the previous 64-bit word encrypts.
Therefore, you can't do any subsequent operation in parallel with any
prior operations.
If you have the 3-loop case, you don't need to know the output of
the 3rd encryption before feeding a new input to the 1st -- so you can
be running 3 at a time, as your diagram showed. That's what gives you
3x the speed.
- Carl