RE: OPES protocol, pre-draft


1. Terminology: I'd suggest using names "OPES processor" and "callout server",
as in the architecture document. OPES server is a bit confusing - too similar
to OPES processor (which in fact is also a server).

I'd also suggest using transaction-id (tid) instead of buffer-id.

2. Transaction boundaries: in your example OPES server does not send
xaction-end. This may result in protocol stack on the callout server keeping
some tid-related data for indefinite time - in order to correctly match
possible incoming messages to the protocol state.

I'd suggest keeping clear transaction structure: each side MUST open and close
transaction with predefined start-end messages. This will result in more
stable and reliable protocol state machine.

3. Client-server protocol versus independent message flow. I think that
client-server architecture (as in ICAP and HTTP) has certain advantages. It
fits well OPES dataflow model: OPES server always initiates transaction as
result of some message received from customer. Callout server provides
services that are initiated by request made by OPES processor. This structure
is simpler, and if we need transactions initiated by callout server
independent from OPES data flow (for example request for policy information,
address book, preferences, etc.) we may use a different unidirectional
channel.

In our case by-directional channel may be presented as two unidirectional
ones, and as far as I can see OPES protocol fits client-server model very
well. All exchanges are taking place in the context of some transaction, each
such transaction consist of request-response sequences, in each transaction we
have a clear division of roles: one side is a service requestor (client),
another - a service produce (server), and the protocol messages that each side
may produce are subordinate to it's role in the current transaction. Moreover,
there are distinct sets of requests that may be send by OPES server / callout
server.



- Oskar

-----Original Message-----
From: owner-ietf-openproxy(_at_)mail(_dot_)imc(_dot_)org
[mailto:owner-ietf-openproxy(_at_)mail(_dot_)imc(_dot_)org]On Behalf Of Alex 
Rousskov
Sent: Tuesday, February 18, 2003 6:25 PM
To: ietf-openproxy(_at_)imc(_dot_)org
Subject: OPES protocol, pre-draft

To bootstrap creation of OPES protocol specs and to show one possible
way to answer the 5 design questions raised on this list, I would like
to propose the following protocol outline. If you think this is too
premature to talk about protocol specifics, please suggest what must
be done first.

I believe this protocol core can be implemented on top of HTTP or
BEEP.  I am less sure about SOAP; I am not a SOAP expert.

I think that the relaxed flow of the proposed protocol is a good idea,
given the combination of robustness requirements and requirements to
support a diverse set of application protocols. Mandatory ACKs and
explicit client-server flow do little good when you are managing
several semi-independent I/O buffers and have to handle, for example,
OPES server disappearance/overload and multiple recipients of a single
message. This relaxed flow and focus on data manipulation are the key
design elements that make the protocol simple yet efficient and robust
(I hope).

The protocol is optimized for the common case when most data is
forwarded "as is", unaffected by the OPES server. Removing this
optimization can simplify the protocol further.

--------------------

Contents:

      1. Terminology and environment
      2. OPES processor-server communication
      2.1 Transaction-independent messages
      2.2 Transaction-dependent messages
      2.2.1 Common transaction-dependent properties
      2.2.2 Major message types
      3. Examples
      4. Transport connections
      5. Synchronization and error handling

1. Terminology and environment

draft-ietf-opes-protocol-reqs-03.txt defines the following information
flow:

  data provider --(original application message)-->
  -- [ OPES magic ] -->
  --(produced application messages)--> data consumers

The original and "produced" (forwarded) messages together form an
application protocol transaction. Note that there may be more
than one produced application message resulting from a single
original message.

When application protocol transaction involves a request-response
sequence (e.g., HTTP), the above scheme remains the same. There are
just two related transactions now:

  provider1 --(client req)--> [ OPES ] --(proxy req)--> consumer1
  provider2 --(server resp)-> [ OPES ] --(proxy resp)-> consumer2

Usually, provider1 is consumer2, and consumer1 is provider2.

Multiple related transactions do not change the nature of the
OPES protocol. The only difference is that more information may
need to be passed from OPES processor to OPES server. For
example, processing of the response flow may need some knowledge
about the request flow.  Individual transactions may become
related.

There may be no server response. Depending on the application
protocol, there may be multiple server and/or proxy responses.

2. OPES processor-server communication

OPES processor and OPES server exchange messages. The exchange is
bidirectional. There is no clear client or server role.  There is
no clear request/response message category either.

There are two major kinds of messages: transaction-independent
and transaction-dependent.

2.1 Transaction-independent messages

Transaction-independent messages exchange information unrelated
to any specific application protocol transaction. They are used
to enquire about supported OPES protocol options, control remote
logging, exchange user preferences, check connectivity status,
etc. Transaction-independent messages are not documented here.

2.2 Transaction-dependent messages

Transaction-dependent messages are the core of the OPES protocol.
These messages alter, block, redirect, clone application messages
to be forwarded to data consumers.  Each transaction-dependent
message is associated with a single application transaction by
means of a unique transaction identifier (xid).

It may be important to keep in mind that transaction-dependent
messages manipulate the state of these four buffers/connections and
associated meta-information:
    - data producer (incoming) buffer at the OPES processor
    - data producer (incoming) buffer at the OPES server
    - data consumer (outgoing) buffer at the OPES server
    - data consumer (outgoing) buffer at the OPES processor

The design keeps buffers as allows to prevent buffer overflows
and discard buffered content when possible. Note that we rely on
transport protocol to be both reliable and to stop sending us
more data (eventually) if we stop reading it. TCP has both
properties.

2.2.1 Common transaction-dependent properties

Some transaction-dependent OPES messages share the following common
properties.

    xid -- Application transaction identifier (Xaction ID)

    source -- Information about the data provider (i.e., the
      source of the application message). For messages
      originated from the OPES processor, the source describes
      the original data provider.  For messages originated from
      the OPES server, the source describes what provider
      information should be presented to the data consumer;
      OPES server may need to change how the original
      information looks to the other application side.

    destinations -- One or more destinations. A single
      destination is information about the data consumer (i.e,
      the destination of the application message). For messages
      originated from the OPES processor, destination describes
      the consumer as intended by the producer. For messages
      originated from the OPES server, the destination is the
      data consumer that should be used by the OPES processor;
      OPES server may need to change the intended recipient.
      Depending on the application, OPES processor may need
      to check that all destinations have been covered by
      OPES server.

    data size -- Specific data size in octets OR a special token
      meaning "all" or "maximum". The all-token may only be
      used when requesting data, never when sending it.

    reason -- This should probably be a numeric status code with
      an optional information string. We will use just strings
      for now.

2.2.2 Major message types

    OPES processor may send the following transaction-dependent
messages to the OPES server. [ Note: The XML-like syntax does NOT
imply that the messages should be implemented using XML. Text or
binary encodings can be used; the encoding decision is out of
scope of this document. ]

    <xaction-start xid services ...>
      Informs OPES server about a new application
      transaction. This message should probably identify OPES
      service(s) requested for this transaction and other
      transaction-global info unrelated to data buffering,
      sources, or destinations.

    <producer-start xid bid source destinations>
      Informs OPES server about a new request from the data
      producer. Buffer IDentifier (bid) uniquely identifies the data
      buffer used for this request. Bid can probably be set to
      xid unless we expect to handle protocols that may merge
      requests before forwarding them.

    <data-have bid offset size [copied] >
      Sends [a portion of] application message from the data
      producer buffer to the OPES server. If "copied" flag
      is set, the OPES server may assume that the corresponding
      data is buffered at the processor and may refer to it
      using <data-as-is> messages described below.

    <data-pause bid>
      Notifies OPES server that there will be no more data
      for this transaction (coming from the OPES processor)
      UNLESS OPES server explicitly asks for it using
      <data-need-data> message described below

    <data-end bid reason>
      Notifies OPES server that there will be no more data
      for this transaction (coming from the OPES processor)

    <producer-end bid reason>
      Notifies OPES server that there will be no more messages
      for this bid (coming from the OPES processor)

    <xaction-end xid reason>
      Notifies OPES server that there will be no more messages
      for this transaction (coming from the OPES processor)

    OPES server may send the following transaction-dependent
messages to the OPES processor.

    <consumer-start xid bid source destinations />
      Informs OPES processor that OPES server may want to send
      data from source to destination(s). The Buffer IDentifier
      (bid) is unique for the (xid, source, destinations)
      triplet and identifies consumer buffer at the OPES
      server. There may be other buffers/messages (bid
      triplets) associated with the same transaction (xid). Xid
      comes from the corresponding xaction-start message send
      by the OPES processor.

    <data-have bid offset size>
      Tells OPES processor to send the attached data to the
      data consumer

    <data-asis bid offset size>
      Tells OPES processor to use processor's own copy of the
      specified data to send to the data consumer. This message
      can only specify data fragments previously marked with
      "copied" flag in a <data-have> message from OPES processor.

    <data-wont-need bid offset size>
      Tells OPES processor that the server will never send
      data-asis message for the specified data range. This
      message can only specify data fragments previously marked
      with the "copied" flag in a <data-have> message from OPES
      processor. This optional message may help OPES processor
      to free its resources.

    <data-need bid offset size>
      Tells OPES processor to send the specified data segment
      to the OPES server (probably in response to data-pause
      message from the OPES server).

    <data-pause bid>
      Notifies OPES processor that it should not send more data
      for this transaction until OPES server explicitly asks
      for it using data-need message described above

    <data-end bid reason>
      Tells OPES processor that there will be no more data
      for this bid (coming from the OPES server)

    <consumer-end bid reason>
      Notifies OPES server that there will be no more messages
      for this bid (coming from the OPES server)

    <xaction-end xid reason>
      Notifies OPES server that there will be no more messages
      for this transaction (coming from the OPES server)

Note: There needs to be a way for OPES server to tell OPES
processor to terminate (or short-circuit) the forwarding of a
message. This feature needs to be added to the protocol, but it
should not change the overall design.

3. Examples

Here is an example of (not) filtering an HTTP message based
on HTTP headers:

      processor: <xaction-start xid1 services ...>
      processor: <producer-start xid1 bid11 source destination>
      processor: <data-have bid11 offset=0 size=headers copied>
      processor: <data-pause bid11>

      server: <consumer-start xid1 bid12 source destination >
      server: <data-asis bid12 offset=0 size=all>
      server: <xaction-end xid1 "end-of-HTTP-message">

Note that xaction-end implies consumer-end implies data-end, and
there is no reason for OPES processor to send a xaction-end
message to server if the server already sent xaction-end message.
The lines above are grouped about possible network I/O
boundaries; thus, only two network data packets may be required
to process a message if the OPES server decides it does not care
based on the headers.

Here is an example of redirecting an HTTP request by changing its
destination info and corresponding HTTP headers:

      processor: <xaction-start xid2 services ...>
      processor: <producer-start xid2 bid21 source destination>
      processor: <data-have bid21 offset=0 size=headers copied>
      processor: <data-pause bid11>

      server: <consumer-start xid2 bid22 source other-destination >
      server: <data-have bid22 offset=0 size=new-headers>
      server: <xaction-end xid2 "end-of-HTTP-message">

Finally, here is an example of modifying the "middle" part of
HTTP message body. The OPES server switches the message encoding
to chunked, to avoid buffering data to figure out new Content-Length.

      processor: <xaction-start xid3 services ...>
      processor: <producer-start xid3 bid31 source destination>
      processor: <data-have bid31 offset=0 size=headers copied>
      processor: <data-pause bid11>

      server: <consumer-start xid3 bid32 source destination >
      server: <data-have bid32 offset=0 size=new-headers>
      server: <data-wont-need bid31 offset=0 size=headers>
      server: <data-need bid31 offset=headers size=all>

      processor: <data-have bid31 offset=headers size=chunk1 copied>

      server: <data-asis bid32 offset=headers size=chunk1>

      processor: <data-have bid31 offset=chunkOff1 size=chunk2 copied>

      /* send modified chunk, tell processor to ignore the original */
      server: <data-have bid32 offset=newheaders+chunk1 size=chunk2mod>
      server: <data-wont-need bid31 offset=chunkOff1 size=chunk2>

      processor: <data-have bid31 offset=chunkOff2 size=chunk3 copied>
      processor: <data-end bid31 "end-of-HTTP-message">

      server: <data-asis bid32 offset=chunkOff2 size=chunk3>
      server: <xaction-end xid3 "end-of-HTTP-message">

Note that once the flow starts, there are no explicit synchronization
points or waiting. The above message order is not the only one
possible: most messages from the processor are not synchronized with
most messages from the server.

4. Transport connections

Transport connections would depend on the transport protocol
(HTTP, BEEP, etc.). It is important to note that regardless of
the transport protocol chosen, it is possible to multiplex
messages from the OPES processor (or from the server) over
several persistent connections -- transaction-dependent messages
do not depend on "connection" properties except for the basic
requirement that ordered messages use the same connection, in the
right order.

Simple connection-specific messages can be introduced if we want
to support keep-alive checks or if we want to retry aborted
connections.

5. Synchronization and error handling.

The protocol has very few explicit dependencies between messages.
It is trivial to imagine a case where incorrect processor or
server implementation would result in deadlocks or other bad
states.  All sorts of deadlocks are resolved using timeouts. If
there is no progress with the transaction for an
admin-configurable time, the transaction is aborted. Aborting at
OPES server side is easy:

      server: <xaction-end xid3 "deadlock">

On the processor side, specific actions would depend on the
protocol and state. For example, if no response bytes have been
sent to an HTTP client yet, then an error response can be sent.

It would be also possible, in some states, to eliminate OPES
server from processing if it fails. Supporting this behavior
would require having a copy of entire application messages even
is OPES server tells us it does not need a copy. The exact
behavior must be admin-configurable.

---------------------------

This is just a start, of course. Many details are not specified
yet. For example, can OPES server request that a sub-fragment of
copied fragment is forwarded "as is"? What is the best way to
handle modifications of a non-chunked HTTP transfer that change
total message-length?

Also note that many command/option names should probably be
changed/polished. It is quite possible, for example, to make the
protocol look like the next generation of ICAP if we want to.

I believe the proposed protocol covers current major ICAP capabilities
and supports a few desired optimizations mentioned on the mailing
list.  Is there anything major that the protocol core does not support
and that needs to be supported? (besides security and authentication
features that are irrelevant at this point).

For example, should we add commands to control persistency of
application-protocol connections from the OPES server (I do not think
so)? Should we make it possible for OPES server to change the
application protocol (I do not think so)?

Thank you,

Alex.

--
                            | HTTP performance - Web Polygraph benchmark
www.measurement-factory.com | HTTP compliance+ - Co-Advisor test suite
                            | all of the above - PolyBox appliance