Re: OPES protocol, pre-draft


On Tue, 18 Feb 2003, The Purple Streak, Hilarie Orman wrote:

The OPES processor should make the decisions about what to send to
the consumer.  This might just be a matter of terminology, but the
processor is in control of the source and destination and should not
send messages to new destinations based on OPES server demands.


It looks like a design decision to me (i.e., it can be done both
ways). I think that you may want destination modification by OPES
servers if you want to satisfy Martin's requirement to be able to
produce multiple SMTP messages (to several small groups of recipients)
from one original SMTP message (to one large group of recipients). My
understanding is that it MAY be OPES server responsibility to "split"
the original destination address(es) into two groups. That is why I
let OPES server to modify original destination info.

Other destination-modification examples include request redirection
(within a CDN or at the surrogate). Source-modification examples
include anonymizing proxies.

If there is a consensus that OPES server cannot modify source and
destination info, then we can simplify the protocol a little bit.
Is there a consensus regarding this design decision? Perhaps Abbie's
poll will show...

The start message should identify the total length of the data, if it
is available.  This might stretch over several "bids", see below.


Good point. This extra info about anticipated message length should
not hurt and may be used for resource pre-allocation purposes. It MUST
NOT be treated as normative/final, of course. I will modify the
messages accordingly.

The first response from the server should identify the new total length,
if it is available.  If the length will change, but the new size is
unknown, the server should indicate this.


Agreed, except we also need to support the case where the OPES server
does not know whether the length will change. For example, if the
server replaces "foo" with "blah" and "bleh" with "bar", it can tell
the final length (and whether it will change) only after seeing all
message content.

To make things more general and symmetric, I would make it possible
(but optional) to supply this non-normative length estimate with every
relevant message, in both directions.

There is some confusion about "destination" - the OPES server should
never change the destination (i.e., the endpoint), so I don't see why
it is needed.  In the redirection example, it would be sufficient to
change the headers, and the purpose of "destination" is a mystery to
me.


See above for motivation. The destination is needed because the OPES
processor needs to know where to connect to forward the request. It is
Bad Design to have OPES processor guess that information from
[possibly modified] message headers. This, again, assumes that we want
OPES server to be able to modify destination addresses. If we do not
want that, there is no need for OPES server to pass that info back to
processor, of course.

Note that source and destination information is meta-level information
that is often not completely available from HTTP headers. Take
interception and WCCP-controlled proxies for example. These
intermediaries often have to get destination address based on IP-level
details, not from HTTP headers. Similarly, the source information is
usually not available in the request headers but may be required to
route and modify the message.

Moreover, the protocol should make it possible to exchange other
meta-level information. For example, the time of the request may be
important ("no porn surfing before 6pm!").

The relationship between the application-layer framing, the bid and
offset, and the OPES framing is not clear from your examples.  The
application data may be transmitted to the OPES server a packet at a
time - this will mean a different bid on every data message, if you
literally mean that a bid is a buffer id.  Otherwise, it should have
some other name.


Yes, a terminology/naming problem: By "buffer" in Buffer ID you
probably mean "piece of memory that holds a data packet". I meant
"logical structure that holds all data associated with the application
connection/message".  That is, my-buffer may consist of many
your-buffers. Perhaps "buffer"  should be replaced with "connection"?
But "connection" is bad because OPES server does not really manage
application connections. "Message" seems too overloaded?
"Application message" (amid)? Will change bid to amid unless there are
better ideas.

This is just terminology though. "Bid/amid" is permanent for the
single application message (original or produced). This ID should be
used by processor and server to manage appropriate data structures
associated with the corresponding application connection.

Also, the information about a bid should include its total length.


Not sure why that would be needed. Moreover, the "total length" of the
connection buffers (which is what bid identifies) may change at any
time. OPES server should not care how buffering is done at the OPES
processor side and vice versa. Perhaps your suggestion is a result of
my poor choice of the word "buffer"? See above.

It should be possible to send the start and end messages on a
separate transport connection for handling errors or congestion.


Yes, and it is possible. The start message is the first message for an
"amid"  so it can go on any connection (brand new or idle persistent).
The end messages, if they indicate an immediate abort, do not have to
be in order with data messages and, hence, can be sent on any OPES
connection as well. Recall that there is no protocol-mandated relation
between OPES connections and application transactions/connections. If
one wants to sent something "out of order", they can (and face the
consequences).

The case discussed for multiple services, multiple responses, should
be included.  To support it, one needs multiple service lists and an
id for each response.


I believe this is already supported, kind of. The "services" attribute
of the transaction start message can have a list of services. The OPES
server can initiate multiple consumer-start messages based on that
list. Each consumer-start message from OPES server has a unique bid
(amid).

What is not clear to me is how the OPES server would know whether the
services list is an OR, AND, or XOR, or something else. I suspect we
need to support If-header logic from ICAP if we want to go down this
route. I will polish the protocol once I understand the exact
requirement here. Do we need to support some kind of
service-to-response matching language here? I do not recall a clear
answer in available OPES IDs. Help?

It should be possible to indicate that the transmitted data comes
from several places in the bid.  This allows the OPES processor to
omit huge cookies and other junk; the response, by including this
information, helps the process limit the state and parsing.


Interesting. If I interpret your requirement correctly, we need an
indication that some data was skipped by the OPES processor when
forwarding original application message to OPES server. We also need
an ability to reinject the skipped data into produced application
message. Not sure this can be supported in a general way: OPES server
may modify application headers but it is not clear how it can tell
OPES processor to correctly inject skipped stuff into modified headers
if the server does not know exactly what was skipped.

We may be able to support the above for, say, header values but not
for header names (but this becomes too application-specific!).
Alternatively, we can document that OPES processor is responsible for
injecting skipped stuff the way it deems necessary. In the latter
case, OPES server should be informed that something was skipped, but
it would not care much except for message length interpretation code.
This still makes digital signatures and related concepts hard to
implement or verify. Am I making it more complicated than it is?
Comments?


Thank you,

Alex.

-- 
                            | HTTP performance - Web Polygraph benchmark
www.measurement-factory.com | HTTP compliance+ - Co-Advisor test suite
                            | all of the above - PolyBox appliance