Re: copying commitment and deadlock


On Mon, 24 Mar 2003, Markus Hofmann wrote:

It is not safe in general because there are examples where callout
server might want to reorder message content (e.g., change the
parts order of a multipart e-mail message or move an ad banner to
the end of an HTML page). To make it safe, we would have to
explicitly document this assumption (and, hence, prevent copying
optimization in those rare(?) cases where callout server needs to
change the order of message parts).


Yup, see your point, agreed.

Question is whether added complexity to deal with such rare(?) cases
is justified, or whether we believe that it might be OK to keep it
less complex. Anyone any thoughts on that?


I suggest that we keep it simple unless somebody argues that
reordering message parts is common/important enough to be optimized
for.

The OPES processor will stop reading incoming data when its
buffers are full. That's how most proxies (and servers) I know
work today. One can always control its input, relying on transport
protocol to slow down the producer (TCP) or to drop packets (UDP).

When deadlock occurs, the OPES processor is not reading incoming
data because its buffers are full, and the callout server is not
sending any data back because it waits for more incoming data to
make a decision.


There are buffers at different levels, and I was *not* refering to TCP
buffers or so, but rather to the buffers that temporarily store the
application message, assuming the OPES processor would still be able
to receive TCP/UDP packets, but just no longer be able to temporarily
store the appliction message (or parts of it). In that case, wouldn't
the current protocol design already include everything that's needed?


I was referring to the same case. And yes, the deadlock problem does
exist if we want to support "copying commitment" and, hence, support
callout server ability to get out of the loop in its own.

Example: The OPES processor starts receiving an application message
(e.g. from a Web server). Since it still has buffer available, it
copies the application message and indicates this to the callout
server by setting the [copied] flag.


If a per-OPES-message [copied] flag is used, then there is no problem
(and no possibility for "getting out of the loop" optimization).
[Copied] flag creates no problems because it applies to the current
OPES data message. That data is either copied or not. There is no
commitment that applies to future data chunks.

Copying _commitment_ (i.e, "I will copy from now on" flag) allows for
"getting out of the loop" optimization but is also subject to
deadlock.

If the buffer at the OPES processor now fills to, let's say, 90%,
the OPES processor keeps forwarding the received application
messages to the callout server, but no longer stores them in its
local buffer. This is indicated to the callout server by *not*
setting the copied flag in subsequent callout messages. As such, no
deadlock occurs. What's missing?


Nothing. You give an example of how [copied] flag works. Your example
is correct, and there is no deadlock problem. Note that the callout
server in your example cannot get out of the loop because at no point
in time it has a guarantee that no data is lost if it does.

Here is a summary:

        - [copied] flag is a simple per-message optimization.
          We need to decide when the processor can get rid
          of the copied data. Other than that, there are no
          significant problems or complications I am aware of.
          The [copied] flag alone does not, however, let callout
          servers to get out of the processing loop.

        - [will-always-copy] commitment allows the callout
          server to get out of the loop. This optimization
          is needed for efficient processing of large
          messages (or streams) that turn out to be not
          interesting to the callout server. This optimization
          may cause a deadlock if used "as is".

One way to avoid the deadlock is to support the following
confirmation/dialog:

        server: I want to get out of the loop!
        processor: OK, you can get out of the loop after
                processing first N bytes of this application
                message (N could be zero). I will stop
                sending you more data as of now.
        server: here is the data you asked for (could be
                several data messages)
        server: I am out of the loop

Unfortunately, supporting this kind of dialog efficiently requires
dedicated/priority connection: The primary data connection may be
clogged by (yet unprocessed) data messages to the callout server,
preventing the "OK" response from reaching the server fast. If the
"OK" response is slow, there may be little gain from the optimization
because a lot of data messages would be processed by then.

Alex.