RE: An opes usage question.


Alex,
        Perhaps, the recommendations that you make about using the
current definitions within OPES are applicable. I understand that you
are recommending the use of the "tracing" facility and the "meta data
transfer" facility. 

OPES "tracing" is defined in the "An architecture for OPES" documents.
In this context the document states that : "Providing the ability for
in-band annotation MAY require header extensions on the application
protocol that is used (e.g HTTP)".

The document "HTTP adaptation for OPES" proceeds to define the use of
tracing in the context of HTTP. Similar structures do not exist for
supporting other applications, at the moment. Utilizing the "tracing"
mechanism for flow discovery can work for HTTP. In fact HTTP 1.1,
already defines a field "X-Forwarded-For" to help with such a discovery.
The problem is with other applications.

Meta Transfer is established as a requirement in the document
"Requirements for OPES callout protocols (3.13)". Are you also implying
that we embed meta-data in the traffic flow, just like trace
information? Often, processing of meta-data is done in a background
thread and not in the main thread for gaining processing efficiency. I
had, therefore, understood that OCP should provide the
meta-data-transfer support. Is this correct? In-flow transfer of
meta-data requires support for application protocol, may be possible
only with HTTP, currently. Transferring meta-data using OCP also becomes
tricky when the callout and proxy roles are mixed within one physical
entity, especially with load balancers separating the proxy stages. I'd
like to get your ideas on meta data transfer, I may not have noticed
some earlier discussions on this, any pointers will be greatly
appreciated.

Thanks
Krishna

BTW, the billing servers may never be in the traffic path like it is
shown in the figures that we have been discussing. Instead, two stages
of adaptation servers may be separated by load balancers. 


-----Original Message-----
From: Alex Rousskov [mailto:rousskov(_at_)measurement-factory(_dot_)com] 
Sent: Monday, March 08, 2004 10:08 PM
To: John G. Waclawsky
Cc: OPES Group; Krishna Ramadas
Subject: Re: An opes usage question.


I am afraid the latest discussion/diagrams unnecessary complicated
things. We suddenly started talking about IP addresses of individual
proxies, persistent connections between load balanced proxies, and
other complicated low-level details that should be kept outside of
most OPES protocols. Let's step back a little:

On Wed, 3 Mar 2004, John G. Waclawsky wrote:

1)  The opes framework allows services to be distributed (or

pipelined),

with incremental services being added to each traffic flow at each
stage. This is an opes proxy to proxy communication model.
2) A pool of multiple opes proxies can be provisioned at each stage to
support a large number of flows
3) Installing load balancers between stages to distribute the flows is
ok in an opes framework (and this is a typical business scenario). If
all the flow processing can be achieved in-line then there is no need

to

identify any specific proxy in any pool. In this case we probably

don't

care which previous stage opes proxy did the prior adaptation step.


Agreed.

4) The crux of the problem is how to share information between two
stages of the flow. This sending of metadata from one stage to a
previous stage will require knowledge of specific server addresses (a
more general case might be be to send the metadata in  either

direction).

I hope the last statement is false. IMO, identifying or communicating
directly with individual proxies behind a load balancer makes sane
load balancing impossible. Sane load balancing, by definition,
includes load balancing of identical proxies (identical from external
protocol agents point of view). If all proxies are truly identical
from external agents ping of view, there should be not reason to
identify them individually.

If proxies are not identical for any reason, then we are not load
balancing them; we are managing a pool of different proxies, with some
complex per-protocol selection criteria. The latter model is what
origin server load balancing evolved into and is exactly why HTTP load
balancing requires AI techniques and ugly hacks to work, despite the
fact that pure HTTP is stateless. We should avoid this model (on a
protocol level) if at all possible.

In your specific case, this implies that external proxies must not try
to identify individual proxies behind the load balancer. While we can
build such identification mechanisms, the long-term effect would be
the same as for HTTP: expensive and relatively rigid load balancing
schemes causing headaches for all the parties involved.

The task of such identification should be assigned to load balancers.
If the protocol is designed correctly, a load balancer should be able
to reliably identify the proxy/server it should talk to when the
external proxy sends a follow-up message to the load balancer. We
tried hard to make this possible with the OPES tracing approach. It
should be possible in reverse direction as well.

Instead of using this diagram:

  ContentServer     ContentServer     ContentServer
        |               |                |
        \               |               /
         \              |              /
          ----------------------------
          |      Load Balancer       |
          ----------------------------
            |           |         |
            |           |         |
            |           |         |
  BillingServer  BillingServer   BillingServer
        |               |                |
        \               |               /
         \              |              /
          ----------------------------
          |      Load Balancer       |
          ----------------------------
            |           |         |
            |           |         |
            |           |         |
       AdaptServer  AdaptServer  AdaptServer


let's use the following diagram when designing your billing adjustment
algorithm/protocol:

                  ContentServer
                        |
                 BillingServer
                        |
                   AdaptServer

and require that if any load balancing is introduced, it does not
change the algorithm/protocol in any way. This implies that if IP
addresses are used to identify proxies, then load balancers should put
their own IP addresses instead of the addresses of the proxies being
balanced (and embed known-to-balancer-only meta information to map
flow ID to individual proxy address). OPES tracing allows for such
substitutions and meta information, for example. The notification
algorithms working in the opposite direction should allow them as well
and can reuse the techniques discussed when OPES tracing was
developed.

In other words, instead of allowing a load balancer as a separate
protocol
entity that everybody has to worry about, you require that protocols are
designed so that a group of load balanced agents is visible as a single
agent,
and nobody has to worry about the presence or specifics of load
balancing
(except for the load balancer itself).

Can you think of any real world problem that cannot be solved using the
above
simplified framework? Can you think of a reason why load balancers will
not be
able to hide the presence of multiple proxies from the outside agents?
In
other words, is there a point in drawing load balancers and multiple
proxies
when discussing the protocols you need?

Thanks,

Alex.