ietf-openproxy
[Top] [All Lists]

Re: comments on draft-dracinschi-opes-callout-requirements-00.txt

2002-03-05 20:06:25

Hilarie Orman, Purple Streak Development wrote:

> This document makes many implicit assumptions about the callout
> protocol that need explicit mention.

We do not only have to mention them, but also to consider whether these are valid assumptions. A problem might be that we all have certain application scenarios in mind, but that these have not been agreed upon and not been written down - which brings us back to the necessity of a good scenario and architecture document.

> A remote callout server is a cooperating server that runs
> OPES service modules on behalf of an OPES intermediary.
> I'd not phrase it that way - "on behalf of" has too many vague
> interpretations.  It runs OPES services "when requested to by an
> OPES intermediary" sounds less loaded.

Hm, I've to agree that the "...on behalf of an OPES intermediary..."
is not that a good phrase. At a higher level, it's an
application-layer endpoint requesting service execution, rather than
the OPES intermediary itself. A callout server can be commissioned by
an OPES intermediary to execute specific services as requested by an
application endpoint, although the application endpoint would specify
only WHAT service to execute, and not WHERE the service would be executed.

> Instead of "messages exchanged on the content path" could it read
> "on the transport path"?

This uses the terminology as defined in the "Model" document. If I
recall correctly, one motivation for defining "content path" was that
the content path between a server and a client can be differnt from
the "direct network path" between the same two endpoints, because
there might be intermediaries involved in the content exchange. For
example, messages exchanged by a telnet application might have a path
through the network different from the path "Web" messages would take
between the same client and server. If we believe this distinction is
helpful, we should use "content path" here.

> For 3.1.1 Service identification, I'm not sure what it means for a
> protocol to "uniquely identify a callout service" outside of any
> protocol message context.  I mean, if I don't know where I could use
> the service identification yet, how can I tell if it is important
> that it be unique?  For all I know, local translation tables are
> sufficient.

If I want a callout server to execute a specific service on a certain
message, I need a way to tell the callout server which service to
execute (i.e. I need to indicate the specific service to be executed
in an unequivocal way). That's what the paragraph meant to be about.

> I think the document gets into URL's and header fields before
> stating basic things, like the callout protocol MUST have header
> fields that can contain arbitrary binary data.

Yup, we probably can do a better job in starting with more fundamental
requirements.

Can you give an example motivating why the callout protocol must have
HEADER fields that can contain arbitrary binary data? I guess you
might have gotten this impression from Section 3.1.3, as it talks
about transmitting the message context (which can be in any format) in
header fields. Hm, not sure whether this really would be such a good
idea... This should be reconsidered.

> The stuff at the end of 3.1.3 isn't clear.  If we have several
> requests issued in parallel, why don't we have clear id's for each
> request session, like in a normal protocol?

If we've requests issued in parallel, we've to make sure that these
requests are serviced on a consistent message context. It must not
happen that one service modifies the message context, while another
service executes based on he same message context. This is achieved
most simply by not allowing services to modify the message context at
all - or by not allowing parallel service execution...

> I can't make sense of the second paragraph of 3.1.4.

This meant to say that a callout protocol may allow specification of
payload-specific profiles, i.e. the callout protocol itself defines a
common framework independent from the actual payload message
format,but allows for additional payload-specific profiles to be included.

> In 3.1.6, I can see that an intermediary cannot receive an entire
> message if the message is potentially unbounded, but I think it is
> short-sighted to mandate SHOULD NOT for all messages.  I very much
> favor an OPES design that allows lightweight relevance filtering on
> the intermediary before invoking the callout server, and in that
> case, some messages will be entirely received before a decision
> about callouts is made.

I somewhat disagree in that I'm not convinced it would be a good idea
for OPES intermediaries to base decisions about callouts on the body
of application messages (complexity, performance - OPES intermediaries
are in the content path). This topic was discussed for quite a while
on the list, although this was before the WG has been chartered
officially and we migt therefore raise this issue again.

Despite that, I agree that mandating "The intermediary SHOULD NOT try
to receive the entire message before it is sent to the callout server"
is too strong. It might be ok if the callout protocol is able to do
so. The point rather is that a callout protocol MUST NOT be designed
assuming it will always be able to receive the message in its entirity
before starting to forward.

> For caching, it is important to keep in mind that some things may be
> inherently uncacheable, no matter what the callout server may say.
> If, for example, the object in question is a request message in the
> content transport protocol semantics, the intermediary might have
> good reasons to consider the callout server response to be
> uncacheable.  In this case, and in what follows, the intermediary
> can always reduce the cache validity period.

Agreed.

> I agree that the validity period for cached responses must be
> considered in terms of both the callout server's view, the origin
> server's view, and even local policy on the intermediary.  Thus,
> 3.2.1 needs expansion. I'm not sure what to do if the callout server
> feels it needs to lengthen the validity period offered by the origin
> server.  Seems like a policy matter - the publisher's policy may
> allow such changes, may limit them, etc.

Hm, if the publisher's policy would allow such changes, for example
lengthening the validity period, why didn't the publisher indicate the
longer validity in the original response in the first place? If we
allow overwriting the cachability depending on publisher policies, it
gets more and more complex for the publisher to actually define the
cachability of its content - did I set the header fields correct, did
I set my policies correct, etc.? Basically, it means that cachability
is given by even more parameters than we already have today... I like
it simpler....

> For 3.2.2, Channels, why not allow any number of channels, each with
> its own service parameters?

This is what we had in mind, but somehow didn't write this really down...

> From the description of buffering, it seems clear that the content
> transport protocol must be one that sends data in order, in some
> sense.  That's because there's verbiage about "the initial part",
> etc.

Hm, yes,... this should be stated.

> Why can't the "I'm not buffering any more stuff, send me a response
> ASAP" info be part of a message header for the protocol?  Seems a
> lot cleaner than setting buffering limits.

If I recall correctly, that's also what we ended up in our discussions, but for some reason it didn't get reflected in the document. I need to dig a little bit and see why... One issue to consider is that after including this info and stopping to buffer at the OPES intermediary, it is no longer possible to get the callout server out of the path - even if no modification is required at all. This is becasue there is data in transmission between OPES intermediary and the callout server that isn't buffered anywhere now (path capacity...)

> It must be possible to send isolated parts of the content data to
> the callout server.  Bytes 500-778, bytes 10023-20000, etc.  The
> responses must indicate the range and whether the content has been
> altered and its new length.  This is partially addressed by 3.2.5,
> but that seems to assume that the intermediary sends the whole
> message and the server makes any partiality decisions.

We thought about that and had long discussions, but at the end we weren't sure and didn't put it in. What's the general feeling about that, would this be useful/needed, is it worth the added complexity? What are the application scenarios and the expected benefits?

-Markus