RE: OPES protocol, pre-draft

Before I make any comments, it seems to me this work has a lot of overlap
with things we already did. 

1. The examples are greatly covered in the use cases and deployment
scenarios
2. Some other things seems to me that belong in the requirements draft. It
doesn't make any sense to have two drafts having requirements on the
protocol, moreover the sole purpose of one of them was to have all the
requirements.
3. The idea was to try to reuse a existing protocol instead of designing a
new one. 
4. There are two protocols for OPES, in-path and the callout. Are they going
to be the same, different?

I guess what we need is to incorporate whatever requirements in a
bis/whatever version of the requirements draft, extend the use cases and
scenarios and for every protocol we think has a chance to be the OPES
protocol, match its semantics against the requirements draft, that's what a
requirement draft is for. 

If Alex's document is the bootstrap for the following deliverable

"MAY 03 Initial protocol document for OPES services including their
authorization, invocation, tracking, and enforcement of authorization."

I guess it should be much more to the point and focused. Should we hold a
conf call to iron these things out? 

Chairs?

I guess we should be also working on the rules specification? 


Regards,

Reinaldo

-----Original Message-----
From: Alex Rousskov [mailto:rousskov(_at_)measurement-factory(_dot_)com] 
Sent: Wednesday, February 19, 2003 12:57 AM
To: ietf-openproxy(_at_)imc(_dot_)org
Subject: Re: OPES protocol, pre-draft



On Tue, 18 Feb 2003, The Purple Streak, Hilarie Orman wrote:

The OPES processor should make the decisions about what to send to the 
consumer.  This might just be a matter of terminology, but the 
processor is in control of the source and destination and should not 
send messages to new destinations based on OPES server demands.


It looks like a design decision to me (i.e., it can be done both ways). I
think that you may want destination modification by OPES servers if you want
to satisfy Martin's requirement to be able to produce multiple SMTP messages
(to several small groups of recipients) from one original SMTP message (to
one large group of recipients). My understanding is that it MAY be OPES
server responsibility to "split" the original destination address(es) into
two groups. That is why I let OPES server to modify original destination
info.

Other destination-modification examples include request redirection (within
a CDN or at the surrogate). Source-modification examples include anonymizing
proxies.

If there is a consensus that OPES server cannot modify source and
destination info, then we can simplify the protocol a little bit. Is there a
consensus regarding this design decision? Perhaps Abbie's poll will show...

The start message should identify the total length of the data, if it 
is available.  This might stretch over several "bids", see below.


Good point. This extra info about anticipated message length should not hurt
and may be used for resource pre-allocation purposes. It MUST NOT be treated
as normative/final, of course. I will modify the messages accordingly.

The first response from the server should identify the new total 
length, if it is available.  If the length will change, but the new 
size is unknown, the server should indicate this.


Agreed, except we also need to support the case where the OPES server does
not know whether the length will change. For example, if the server replaces
"foo" with "blah" and "bleh" with "bar", it can tell the final length (and
whether it will change) only after seeing all message content.

To make things more general and symmetric, I would make it possible (but
optional) to supply this non-normative length estimate with every relevant
message, in both directions.

There is some confusion about "destination" - the OPES server should 
never change the destination (i.e., the endpoint), so I don't see why 
it is needed.  In the redirection example, it would be sufficient to 
change the headers, and the purpose of "destination" is a mystery to 
me.


See above for motivation. The destination is needed because the OPES
processor needs to know where to connect to forward the request. It is Bad
Design to have OPES processor guess that information from [possibly
modified] message headers. This, again, assumes that we want OPES server to
be able to modify destination addresses. If we do not want that, there is no
need for OPES server to pass that info back to processor, of course.

Note that source and destination information is meta-level information that
is often not completely available from HTTP headers. Take interception and
WCCP-controlled proxies for example. These intermediaries often have to get
destination address based on IP-level details, not from HTTP headers.
Similarly, the source information is usually not available in the request
headers but may be required to route and modify the message.

Moreover, the protocol should make it possible to exchange other meta-level
information. For example, the time of the request may be important ("no porn
surfing before 6pm!").

The relationship between the application-layer framing, the bid and 
offset, and the OPES framing is not clear from your examples.  The 
application data may be transmitted to the OPES server a packet at a 
time - this will mean a different bid on every data message, if you 
literally mean that a bid is a buffer id.  Otherwise, it should have 
some other name.


Yes, a terminology/naming problem: By "buffer" in Buffer ID you probably
mean "piece of memory that holds a data packet". I meant "logical structure
that holds all data associated with the application connection/message".
That is, my-buffer may consist of many your-buffers. Perhaps "buffer"
should be replaced with "connection"? But "connection" is bad because OPES
server does not really manage application connections. "Message" seems too
overloaded? "Application message" (amid)? Will change bid to amid unless
there are better ideas.

This is just terminology though. "Bid/amid" is permanent for the single
application message (original or produced). This ID should be used by
processor and server to manage appropriate data structures associated with
the corresponding application connection.

Also, the information about a bid should include its total length.


Not sure why that would be needed. Moreover, the "total length" of the
connection buffers (which is what bid identifies) may change at any time.
OPES server should not care how buffering is done at the OPES processor side
and vice versa. Perhaps your suggestion is a result of my poor choice of the
word "buffer"? See above.

It should be possible to send the start and end messages on a separate 
transport connection for handling errors or congestion.


Yes, and it is possible. The start message is the first message for an
"amid"  so it can go on any connection (brand new or idle persistent). The
end messages, if they indicate an immediate abort, do not have to be in
order with data messages and, hence, can be sent on any OPES connection as
well. Recall that there is no protocol-mandated relation between OPES
connections and application transactions/connections. If one wants to sent
something "out of order", they can (and face the consequences).

The case discussed for multiple services, multiple responses, should 
be included.  To support it, one needs multiple service lists and an 
id for each response.


I believe this is already supported, kind of. The "services" attribute of
the transaction start message can have a list of services. The OPES server
can initiate multiple consumer-start messages based on that list. Each
consumer-start message from OPES server has a unique bid (amid).

What is not clear to me is how the OPES server would know whether the
services list is an OR, AND, or XOR, or something else. I suspect we need to
support If-header logic from ICAP if we want to go down this route. I will
polish the protocol once I understand the exact requirement here. Do we need
to support some kind of service-to-response matching language here? I do not
recall a clear answer in available OPES IDs. Help?

It should be possible to indicate that the transmitted data comes from 
several places in the bid.  This allows the OPES processor to omit 
huge cookies and other junk; the response, by including this 
information, helps the process limit the state and parsing.


Interesting. If I interpret your requirement correctly, we need an
indication that some data was skipped by the OPES processor when forwarding
original application message to OPES server. We also need an ability to
reinject the skipped data into produced application message. Not sure this
can be supported in a general way: OPES server may modify application
headers but it is not clear how it can tell OPES processor to correctly
inject skipped stuff into modified headers if the server does not know
exactly what was skipped.

We may be able to support the above for, say, header values but not for
header names (but this becomes too application-specific!). Alternatively, we
can document that OPES processor is responsible for injecting skipped stuff
the way it deems necessary. In the latter case, OPES server should be
informed that something was skipped, but it would not care much except for
message length interpretation code. This still makes digital signatures and
related concepts hard to implement or verify. Am I making it more
complicated than it is? Comments?


Thank you,

Alex.

-- 
                            | HTTP performance - Web Polygraph benchmark
www.measurement-factory.com | HTTP compliance+ - Co-Advisor test suite
                            | all of the above - PolyBox appliance