Re: An opes usage question.



I am afraid the latest discussion/diagrams unnecessary complicated
things. We suddenly started talking about IP addresses of individual
proxies, persistent connections between load balanced proxies, and
other complicated low-level details that should be kept outside of
most OPES protocols. Let's step back a little:

On Wed, 3 Mar 2004, John G. Waclawsky wrote:

1)  The opes framework allows services to be distributed (or pipelined),
with incremental services being added to each traffic flow at each
stage. This is an opes proxy to proxy communication model.
2) A pool of multiple opes proxies can be provisioned at each stage to
support a large number of flows
3) Installing load balancers between stages to distribute the flows is
ok in an opes framework (and this is a typical business scenario). If
all the flow processing can be achieved in-line then there is no need to
identify any specific proxy in any pool. In this case we probably don't
care which previous stage opes proxy did the prior adaptation step.


Agreed.

4) The crux of the problem is how to share information between two
stages of the flow. This sending of metadata from one stage to a
previous stage will require knowledge of specific server addresses (a
more general case might be be to send the metadata in  either direction).


I hope the last statement is false. IMO, identifying or communicating
directly with individual proxies behind a load balancer makes sane
load balancing impossible. Sane load balancing, by definition,
includes load balancing of identical proxies (identical from external
protocol agents point of view). If all proxies are truly identical
from external agents ping of view, there should be not reason to
identify them individually.

If proxies are not identical for any reason, then we are not load
balancing them; we are managing a pool of different proxies, with some
complex per-protocol selection criteria. The latter model is what
origin server load balancing evolved into and is exactly why HTTP load
balancing requires AI techniques and ugly hacks to work, despite the
fact that pure HTTP is stateless. We should avoid this model (on a
protocol level) if at all possible.

In your specific case, this implies that external proxies must not try
to identify individual proxies behind the load balancer. While we can
build such identification mechanisms, the long-term effect would be
the same as for HTTP: expensive and relatively rigid load balancing
schemes causing headaches for all the parties involved.

The task of such identification should be assigned to load balancers.
If the protocol is designed correctly, a load balancer should be able
to reliably identify the proxy/server it should talk to when the
external proxy sends a follow-up message to the load balancer. We
tried hard to make this possible with the OPES tracing approach. It
should be possible in reverse direction as well.

Instead of using this diagram:

  ContentServer     ContentServer     ContentServer
        |               |                |
        \               |               /
         \              |              /
          ----------------------------
          |      Load Balancer       |
          ----------------------------
            |           |         |
            |           |         |
            |           |         |
  BillingServer  BillingServer   BillingServer
        |               |                |
        \               |               /
         \              |              /
          ----------------------------
          |      Load Balancer       |
          ----------------------------
            |           |         |
            |           |         |
            |           |         |
       AdaptServer  AdaptServer  AdaptServer


let's use the following diagram when designing your billing adjustment
algorithm/protocol:

                  ContentServer
                        |
                 BillingServer
                        |
                   AdaptServer

and require that if any load balancing is introduced, it does not
change the algorithm/protocol in any way. This implies that if IP
addresses are used to identify proxies, then load balancers should put
their own IP addresses instead of the addresses of the proxies being
balanced (and embed known-to-balancer-only meta information to map
flow ID to individual proxy address). OPES tracing allows for such
substitutions and meta information, for example. The notification
algorithms working in the opposite direction should allow them as well
and can reuse the techniques discussed when OPES tracing was
developed.

In other words, instead of allowing a load balancer as a separate protocol
entity that everybody has to worry about, you require that protocols are
designed so that a group of load balanced agents is visible as a single agent,
and nobody has to worry about the presence or specifics of load balancing
(except for the load balancer itself).

Can you think of any real world problem that cannot be solved using the above
simplified framework? Can you think of a reason why load balancers will not be
able to hide the presence of multiple proxies from the outside agents? In
other words, is there a point in drawing load balancers and multiple proxies
when discussing the protocols you need?

Thanks,

Alex.