There were many great suggestions on how to improve the protocol
pre-draft. Please find an updated document below. Here is the change
log:
- replaced "bid" with "amid" attribute and fixed amid
definition due to many compliants and confusions
- replaced OPES server with callout server (Oskar Batuner)
- added "sizep", an optional message size prediction attribute
(Hilarie Orman)
- added "modp", an optional modification prediction attribute
(Martin Stecher)
- added <i-am-here> messages (Martin Stecher)
- simplified document structure;
removed many general remarks and moved essential ones
into a short Introduction section;
polished text
- added TODO section
Comments and help with the TODO list are requested.
Thank you,
Alex.
------------------------
Table of Contents:
0. Introduction.
1. Message properties
2. Message types
3. Examples
4. Transport connections
5. Synchronization and error handling
6. About this document
7. TODO
0. Introduction
draft-ietf-opes-protocol-reqs-03.txt defines the following
information flow:
data provider --(original application message)-->
-- [ OPES magic ] -->
--(produced application messages)--> data consumers
The original and "produced" (forwarded) messages together
form an application protocol transaction. Note that there
may be more than one produced application message resulting
from a single original message.
When application protocol involves a request-response
sequence (e.g., HTTP), we treat it as two related OPES
transactions: request transaction and response transaction.
OPES processor and callout server exchange messages. The
exchange is bidirectional. There is no clear client or
server role. There is no clear request/response message
category either.
OPES messages manipulate the state of these four
buffers/connections and associated meta-information:
- data producer (incoming) buffer at the OPES processor
- data producer (incoming) buffer at the callout server
- data consumer (outgoing) buffer at the callout server
- data consumer (outgoing) buffer at the OPES processor
The design prevents buffer overflows and allows to discard
buffered content as soon as possible. Note that we rely on
OPES transport protocol to be both reliable and to stop
sending us more data (eventually) if we stop reading it. TCP
has both properties.
[ Note: The XML-like syntax for describing protocol parts
does NOT imply that OPES messages should be implemented using
XML. Text or binary encodings can be used; the encoding
decision is out of scope of this document. ]
1. Message properties
Many OPES messages share the following properties.
xid -- Application transaction identifier (Xaction ID)
Uniquely identifies an application transaction
among all OPES agents that may see this ID.
amid -- Application Message IDentifier
Uniquely identifies an application message within an
application transaction. Amid can be interpreted in
an application transaction context only. Thus,
either xid must be present whenever amid is used or
amid must uniquely identify application transaction
as well (e.g., by containing xid). [ @@@ we should
decide one way or the other ]
source -- Information about the data provider (i.e., the
source of the application message). For messages
originated from the OPES processor, the source
describes the original data provider. For messages
originated from the callout server, the source
describes what provider information should be
presented to the data consumer; callout server may
need to change how the original information looks to
the other application side.
destinations -- One or more destinations.
Depending on the application, OPES processor may
need to check that all original destinations have
been covered by callout server.
destination -- Information about the data consumer (i.e,
the destination of the application message). For
messages originated from the OPES processor,
destination describes the consumer as intended by
the producer. For messages originated from the
callout server, the destination is the data consumer
that should be used by the OPES processor; callout
server may need to change the intended recipient.
services -- One or more services.
There will be a way to indicate desired order
of service application, possibly including
concurrent applications at the callout server
data size -- Specific data size in octets OR a special
token meaning "all" or "maximum". The all-token may
only be used when requesting data, never when
sending it.
data offset -- non-negative number of octets
relative to the beginning of the application message
sizep -- size prediction
An integer value of at least zero. Size-prediction
property carries remaining application message size
prediction, in octets. The value includes data in
the current message, if any. This property can be
used in any message with amid property. This is a
prediction, not a fact.
modp -- modification probability prediction with
An integer value from 0 to 100, indicating the
probability (0 = will probably never happen, 100 =
probably imminent) that some produced data following
the prediction (including data in the current
message, if any) will differ from the original data.
A reading of 100 does not imply that the current (or
any!) message data has been modified. This is a
prediction, not a fact.
This property can be present in any callout server
message with amid property. Absence of the property
means absence of a [new] prediction, not that there
will be no modifications. Note that prediction is
persistent for the given amid unless overwritten
by a different value of modp in a later message.
[ @@@ if OPES can change meta-info like destination
address, should that be included in modification
semantics? ]
reason -- This should probably be a numeric status code with
an optional information string. In examples, we will
use just strings for now.
2. Message types
An OPES processor may send the following messages to the
callout server.
<xaction-start xid services ...>
Informs callout server about a new application
transaction. This message should probably identify OPES
service(s) requested for this transaction and other
transaction-global info unrelated to data buffering,
sources, or destinations.
<producer-start xid amid source destinations >
Informs callout server about a new message from the
data producer. Amid can probably be set to xid
unless we expect to handle protocols that may merge
messages before forwarding them.
<data-have amid offset size [copied] >
Sends [a portion of] application message from the
data producer buffer to the callout server. If
"copied" flag is set, the callout server may assume
that the corresponding data is buffered at the
processor and may refer to it using <data-as-is>
messages described below. Copying commitment must
last until the corresponding <data-as-is> message or
<consumer-end> event.
<data-pause amid>
Notifies callout server that there will be no more
data for this transaction (coming from the OPES
processor) UNLESS callout server explicitly asks for
it using <data-need> message described below. This
message may be used if OPES processor suspects that
callout server is not interested in the data and, hence,
there is no reason to send it by default (e.g., a
response content type indicates that it is unlikely
to have a virus but only callout server can know for sure).
<data-end amid reason>
Notifies callout server that there will be no more data
for this transaction (coming from the OPES processor)
<producer-end amid reason>
Notifies callout server that there will be no more messages
for this amid (coming from the OPES processor)
<xaction-end xid reason>
Notifies callout server that there will be no more messages
for this transaction (coming from the OPES processor)
A callout server may send the following messages to the
OPES processor.
<consumer-start xid amid source destinations />
Informs OPES processor that callout server may want
to send data from source to destination(s). There
may be other messages (amids) associated with the
same transaction (xid). Xid comes from the
corresponding xaction-start message send by the OPES
processor.
<data-have amid offset size>
Tells OPES processor to send the attached data to the
data consumer.
<data-as-is amid offset size>
Tells OPES processor to use processor's own copy of the
specified data to send to the data consumer. This message
can only specify data fragments previously marked with
"copied" flag in a <data-have> message from OPES processor.
<data-wont-need amid offset size>
Tells OPES processor that the callout server will
never send data-as-is message for the specified data
range. This message can only specify data fragments
previously marked with the "copied" flag in a
<data-have> message from OPES processor. This
message amid must match the <data-have> (producer)
message amid, not the consumer amid. This optional
message may help OPES processor to free its
resources.
<data-need amid offset size>
Tells OPES processor to send the specified data
segment to the callout server (probably in response
to data-pause message from the callout server). This
message amid must match the corresponding producer
amid, not the consumer amid.
<data-pause amid>
Notifies OPES processor that it should not send more
data for this transaction until callout server
explicitly asks for it using data-need message
described above. This message amid must match the
corresponding producer amid, not the consumer amid.
<data-end amid reason>
Tells OPES processor that there will be no more data
for this amid (coming from the callout server)
<consumer-end amid reason>
Notifies callout server that there will be no more messages
for this amid (coming from the callout server)
<xaction-end xid reason>
Notifies callout server that there will be no more messages
for this transaction (coming from the callout server)
Note: There needs to be a way for callout server to tell
OPES processor to terminate (or short-circuit) the
forwarding of a message. This feature needs to be added to
the protocol, but it should not change the overall design.
One way to support this feature is for callout server to
change the destination of the application message from
consumer to producer (and change source to itself?).
OPES processor or callout server may send the following
messages.
<i-am-here>
<i-am-here xid>
<i-am-here xid amid>
The messages tell recipient that the sender is
working, working on xid, or working on amid,
respectively. The sender may not be able to send any
other message (yet), but wants to inform the
recipient that it knows of recipient's (or xid, or
amid) existence. The sender MAY send more specific
messages later.
3. Examples
Here is an example of (not) filtering an HTTP message based
on HTTP headers:
processor: <xaction-start xid1 services ...>
processor: <producer-start xid1 amid11 source destination>
processor: <data-have amid11 offset=0 size=headers copied>
processor: <data-pause amid11>
server: <consumer-start xid1 amid12 source destination >
server: <data-as-is amid12 offset=0 size=all>
server: <xaction-end xid1 "end-of-HTTP-message">
Note that xaction-end implies consumer-end implies data-end, and
there is no reason for OPES processor to send a xaction-end
message to server if the server already sent xaction-end message.
The lines above are grouped about possible network I/O
boundaries; thus, only two network data packets may be required
to process a message if the callout server decides it does not care
based on the headers.
Here is an example of redirecting an HTTP request by changing its
destination info and corresponding HTTP headers:
processor: <xaction-start xid2 services ...>
processor: <producer-start xid2 amid21 source destination>
processor: <data-have amid21 offset=0 size=headers copied>
processor: <data-pause amid11>
server: <consumer-start xid2 amid22 source other-destination >
server: <data-have amid22 offset=0 size=new-headers>
server: <xaction-end xid2 "end-of-HTTP-message">
Finally, here is an example of modifying the "middle" part of
HTTP message body. The callout server switches the message encoding
to chunked, to avoid buffering data to figure out new Content-Length.
processor: <xaction-start xid3 services ...>
processor: <producer-start xid3 amid31 source destination>
processor: <data-have amid31 offset=0 size=headers copied>
processor: <data-pause amid11>
server: <consumer-start xid3 amid32 source destination >
server: <data-have amid32 offset=0 size=new-headers>
server: <data-wont-need amid31 offset=0 size=headers>
server: <data-need amid31 offset=headers size=all>
processor: <data-have amid31 offset=headers size=chunk1 copied>
server: <data-as-is amid32 offset=headers size=chunk1>
processor: <data-have amid31 offset=chunkOff1 size=chunk2 copied>
/* send modified chunk, tell processor to ignore the original */
server: <data-have amid32 offset=newheaders+chunk1 size=chunk2mod>
server: <data-wont-need amid31 offset=chunkOff1 size=chunk2>
processor: <data-have amid31 offset=chunkOff2 size=chunk3 copied>
processor: <data-end amid31 "end-of-HTTP-message">
server: <data-as-is amid32 offset=chunkOff2 size=chunk3>
server: <xaction-end xid3 "end-of-HTTP-message">
Note that once the flow starts, there are no explicit synchronization
points or waiting. The above message order is not the only one
possible: most messages from the processor are not synchronized with
most messages from the server.
4. Transport connections
OPES transport connections would depend on the transport
protocol (HTTP, BEEP, etc.). It is important to note that
regardless of the transport protocol chosen, it is possible
to multiplex messages from the OPES processor (or from the
server) over several persistent connections. OPES messages
do not depend on "connection" properties except for the
basic requirement that order-dependent messages use the same
transport connection, in the right order.
5. Synchronization and error handling.
The protocol has very few explicit dependencies between messages.
It is trivial to imagine a case where incorrect processor or
server implementation would result in deadlocks or other bad
states. All sorts of deadlocks are resolved using timeouts. If
there is no progress with the transaction for an
admin-configurable time, the transaction is aborted. Aborting at
callout server side is easy:
server: <xaction-end xid3 "deadlock">
On the processor side, specific actions would depend on the
protocol and state. For example, if no response bytes have been
sent to an HTTP client yet, then an error response can be sent.
It would be also possible, in some states, to eliminate OPES
server from processing if it fails. Supporting this behavior
would require having a copy of entire application messages even
is callout server tells us it does not need a copy. The exact
behavior must be admin-configurable.
6. About this document
This document goal is to become a section in the future OPES
protocol specs, after a lot of editing. The OPES milestone
reads: "MAY 03 Initial protocol document for OPES services
including their authorization, invocation, tracking, and
enforcement of authorization".
7. TODO
1. Decide whether callout server can change
application message source and destinations
2. Understand and support the following: "It should
be possible to indicate that the transmitted data
comes from several places in the amid. This allows
the OPES processor to omit huge cookies and other
junk; the response, by including this information,
helps the process limit the state and parsing."
3. Document how one can write OPES extensions. Use
"progress meter" as an example/motivation.
$Id: protocol.txt,v 1.2 2003/02/21 21:25:14 rousskov Exp rousskov $