RE: transfer- and content-encoding


On Tue, 2003-10-14 at 14:59, Alex Rousskov wrote:

On Mon, 13 Oct 2003, Robert Collins wrote:

Well the HTTP errata removed the 'identity' Transfer coding. I don't
see any reason to reinstate it.


The reason for documenting "identity" encoding tag is to express the
lack of support for identity encoding. For example, a particular
service may want to receive all data using some custom transfer
encoding. I doubt such design is worth supporting though. That is,
this reason is probably not good enough. A service that requires
custom encoding can probably use content-encoding instead or rely on
manual configuration rather than run-time negotiation.


I think that the solution implied by your later message (I've already
replied to that) is much better. That is - everything is identity
between agents unless explicitly negotiated into a different transfer
coding. Even so sending via chunked may be a better lowest common
denominator due to the built in end of data indicator. I'm not fully up
to speed on the current draft - I've been skim reading due to general
busyness unfortunately.

As TE is hop by hop in HTTP, we need to ensure that any OPES
processor's TE field passed to upstreams, and TE field passed in
responses to client (for them to decide on upload
Transfer-Encodings) is the processors capabilities, not the actual
client / origin respectively.


Proposed TEs headers are exchanged among OPES agents only and are not
passed to HTTP agents. It is an OCP header, not an HTTP header. We
cannot assume that OPES agents will control HTTP headers when adapting
HTTP payloads. This caveat is at the core of our problems here.
Consider an virus scanning service -- it should not affect TE headers
on the "outside" wire, but it should be able to handle any common
content that the corresponding HTTP intermediary proxies. That is, OCP
agents should be able to handle a variety of common transfer encodings
without being able to affect "outside" encoding negotiations.


Which directly conflicts with HTTP's hop by hop requirements - anything
that is hop by hop will break, and break badly, unless we explicitly
handle the hop by hop semantics. (And any other message based protocol
other than HTTP that splits hops and messages will have similar issues).
I hope I'm not missing an existing solution when I point out that the
OCP device performing the interception and upstream forwarding has the
responsibility of negotiating in the intercepted protocols boundaries,
and gatewaying into the OCP protocol. So yes: the internal OCP encodings
should stay in the OCP area, but that wasn't what I referred to. What I
was saying is that:

Origin 
   |
AV interceptor(OPES processor)  ---- AV engine
   |
Client

in this diagram, the AV interceptor is responsible for Connection:
semantics, for TE and transfer-encoding semantics, and Trailer:
semantics, to pick a few common headers. Thus: the client will see the
AV Interceptors TE headers, not the origins TE headers.

Once we do that, we know that an OPES processor will only receive
codings it can handle, so we can say MUST reject with a 5xx error
(sorry to lazy to dig up the best match) on an unhandlable
transfer-coding.


HTTP proxy capabilities may be different from an attached OPES
processor capabilities (which, in turn, may be different from an
attached callout service capabilities). This is true for
Transfer-Codings and for some other features. We cannot simply assume
that OPES processor is an HTTP proxy, even if it adapts HTTP messages.


But we can assume that the OPES processor will only negotiate what it
can handle, in the protocol(s) it exposes to the clients/origins. So for
the OCP protocol to have unhandlable transfer-codings arrive is in fact
a failure of the OPES processors responsbilities to the protocol it's
exposing, and IMO a OPES error of some sort is appropriate.

With that in place, the interaction from processor to processor can
'trivially' follow the HTTP Transfer-coding negotiation rules. That
is, from a protocol viewpoint, all transfer-codings must be removed
and applied anew across hops. By definition - implementations can
shortcut this when a compatible transfer-coding sequence exists
across the relevant hop.


I agree with the above. However, we still need to provide a
negotiation mechanism for OPES agents to agree on the actual transfer
encoding to be used. We cannot rely exclusively on HTTP specs because
our agents, especially callout service, may not be HTTP agents. For
example, many callout services will work with message payload and
disregard any HTTP headers; those services will be very sensitive to
encoding issues; they may not, for example, support chunked encoding.


Ah. Very good point. Any reason not to leverage the connection
termination, Connection: TE: and Transfer-Encoding: headers wholesale?
(Yes, showing my skim reading again, I know).

There are quote a few sane options available to us, depending on what
encodings have to be supported and on what to do with custom
encodings that an OCP agent does not support. Given your feedback and
the multitude of options we face, I would propose the following:

      0) Do not negotiate Transfer-Encodings at all.

      1) An OCP agent sending data MUST remove all
         transfer encodings it supports. If any encodings remain, an
         OCP agent sending data MUST specify remaining encodings
         using the Transfer-Encoding parameter of a DUM
         OCP message.

      2) If an OCP agent receives Transfer-Encoding parameter
         indicating unsupported encoding, it MAY terminate
         the corresponding OCP transaction.


for 2) I think MUST terminate is a requirement. 0) seems short sighted
to me - I'd rather see an analogous environment to HTTP:
1) messages pushing data to the agent require foreknowledge of supported
encodings - i.e. during initial handshaking. 
2) messages from the agent back to the processor use metadata supplied
by the processor to determine acceptable codings. 
3) all agents and processors must support chunked, which is trivial to
implement efficiently.

Do you think the above rules create any interoperability problems that
more complex rules can eliminate?


Other than what I mention above, no.

Can we think of a realistic-enough example where removing supported
encodings is bad for performance reasons? Note that an agent may be
_configured_ to leave certain encodings -- that qualifies as lack of
support for their removal. Perhaps the above "MUST remove" can be
rephrased to better reflect this caveat?


Yes - rproxy. That is rsync over http. Mind you, to do *anything* nearly
all agents will need the data in identity format eventually. So perhaps
the strip everything to identity is the sanest option.

      1a) OPES processors MUST support chunked transfer coding
          when handling data send by an OCP server.


IMO Yes. chunked is a good thing.

Rob
-- 
GPG key available at: <http://members.aardvark.net.au/lifeless/keys.txt>.