OPES protocol, pre-draft 01



There were many great suggestions on how to improve the protocol
pre-draft. Please find an updated document below. Here is the change
log:

        - replaced "bid" with "amid" attribute and fixed amid
          definition due to many compliants and confusions

        - replaced OPES server with callout server (Oskar Batuner)

        - added "sizep", an optional message size prediction attribute
          (Hilarie Orman)

        - added "modp", an optional modification prediction attribute
          (Martin Stecher)

        - added <i-am-here> messages (Martin Stecher)

        - simplified document structure;
          removed many general remarks and moved essential ones
          into a short Introduction section;
          polished text

        - added TODO section

Comments and help with the TODO list are requested.

Thank you,

Alex.


------------------------

Table of Contents:

        0. Introduction.
        1. Message properties
        2. Message types
        3. Examples
        4. Transport connections
        5. Synchronization and error handling
        6. About this document
        7. TODO


0. Introduction

draft-ietf-opes-protocol-reqs-03.txt defines the following
information flow:

  data provider --(original application message)-->
  -- [ OPES magic ] -->
  --(produced application messages)--> data consumers

The original and "produced" (forwarded) messages together
form an application protocol transaction. Note that there
may be more than one produced application message resulting
from a single original message.

When application protocol involves a request-response
sequence (e.g., HTTP), we treat it as two related OPES
transactions: request transaction and response transaction.

OPES processor and callout server exchange messages. The
exchange is bidirectional. There is no clear client or
server role.  There is no clear request/response message
category either.

OPES messages manipulate the state of these four
buffers/connections and associated meta-information:
    - data producer (incoming) buffer at the OPES processor
    - data producer (incoming) buffer at the callout server
    - data consumer (outgoing) buffer at the callout server
    - data consumer (outgoing) buffer at the OPES processor

The design prevents buffer overflows and allows to discard
buffered content as soon as possible. Note that we rely on
OPES transport protocol to be both reliable and to stop
sending us more data (eventually) if we stop reading it. TCP
has both properties.

[ Note: The XML-like syntax for describing protocol parts
does NOT imply that OPES messages should be implemented using
XML. Text or binary encodings can be used; the encoding
decision is out of scope of this document. ]


1. Message properties

Many OPES messages share the following properties.

    xid -- Application transaction identifier (Xaction ID)
        Uniquely identifies an application transaction
        among all OPES agents that may see this ID.

    amid -- Application Message IDentifier
        Uniquely identifies an application message within an
        application transaction. Amid can be interpreted in
        an application transaction context only.  Thus,
        either xid must be present whenever amid is used or
        amid must uniquely identify application transaction
        as well (e.g., by containing xid). [ @@@ we should
        decide one way or the other ]

    source -- Information about the data provider (i.e., the
        source of the application message). For messages
        originated from the OPES processor, the source
        describes the original data provider.  For messages
        originated from the callout server, the source
        describes what provider information should be
        presented to the data consumer; callout server may
        need to change how the original information looks to
        the other application side.

    destinations -- One or more destinations.
        Depending on the application, OPES processor may
        need to check that all original destinations have
        been covered by callout server.

    destination -- Information about the data consumer (i.e,
        the destination of the application message). For
        messages originated from the OPES processor,
        destination describes the consumer as intended by
        the producer. For messages originated from the
        callout server, the destination is the data consumer
        that should be used by the OPES processor; callout
        server may need to change the intended recipient.

    services -- One or more services.
        There will be a way to indicate desired order
        of service application, possibly including
        concurrent applications at the callout server

    data size -- Specific data size in octets OR a special
        token meaning "all" or "maximum". The all-token may
        only be used when requesting data, never when
        sending it.

    data offset -- non-negative number of octets
        relative to the beginning of the application message

    sizep -- size prediction
        An integer value of at least zero.  Size-prediction
        property carries remaining application message size
        prediction, in octets.  The value includes data in
        the current message, if any. This property can be
        used in any message with amid property.  This is a
        prediction, not a fact.

    modp -- modification probability prediction with
        An integer value from 0 to 100, indicating the
        probability (0 = will probably never happen, 100 =
        probably imminent) that some produced data following
        the prediction (including data in the current
        message, if any) will differ from the original data.
        A reading of 100 does not imply that the current (or
        any!) message data has been modified. This is a
        prediction, not a fact.

        This property can be present in any callout server
        message with amid property.  Absence of the property
        means absence of a [new] prediction, not that there
        will be no modifications.  Note that prediction is
        persistent for the given amid unless overwritten
        by a different value of modp in a later message.

        [ @@@ if OPES can change meta-info like destination
        address, should that be included in modification
        semantics? ]

    reason -- This should probably be a numeric status code with
        an optional information string. In examples, we will
        use just strings for now.


2. Message types

    An OPES processor may send the following messages to the
callout server.

    <xaction-start xid services ...>
        Informs callout server about a new application
        transaction. This message should probably identify OPES
        service(s) requested for this transaction and other
        transaction-global info unrelated to data buffering,
        sources, or destinations.

    <producer-start xid amid source destinations >
        Informs callout server about a new message from the
        data producer. Amid can probably be set to xid
        unless we expect to handle protocols that may merge
        messages before forwarding them.

    <data-have amid offset size [copied] >
        Sends [a portion of] application message from the
        data producer buffer to the callout server. If
        "copied" flag is set, the callout server may assume
        that the corresponding data is buffered at the
        processor and may refer to it using <data-as-is>
        messages described below. Copying commitment must
        last until the corresponding <data-as-is> message or
        <consumer-end> event.

    <data-pause amid>
        Notifies callout server that there will be no more
        data for this transaction (coming from the OPES
        processor) UNLESS callout server explicitly asks for
        it using <data-need> message described below. This
        message may be used if OPES processor suspects that
        callout server is not interested in the data and, hence,
        there is no reason to send it by default (e.g., a
        response content type indicates that it is unlikely
        to have a virus but only callout server can know for sure).

    <data-end amid reason>
        Notifies callout server that there will be no more data
        for this transaction (coming from the OPES processor)

    <producer-end amid reason>
        Notifies callout server that there will be no more messages
        for this amid (coming from the OPES processor)

    <xaction-end xid reason>
        Notifies callout server that there will be no more messages
        for this transaction (coming from the OPES processor)


    A callout server may send the following messages to the
OPES processor.

    <consumer-start xid amid source destinations />
        Informs OPES processor that callout server may want
        to send data from source to destination(s). There
        may be other messages (amids) associated with the
        same transaction (xid).  Xid comes from the
        corresponding xaction-start message send by the OPES
        processor.

    <data-have amid offset size>
        Tells OPES processor to send the attached data to the
        data consumer.

    <data-as-is amid offset size>
        Tells OPES processor to use processor's own copy of the
        specified data to send to the data consumer. This message
        can only specify data fragments previously marked with
        "copied" flag in a <data-have> message from OPES processor.

    <data-wont-need amid offset size>
        Tells OPES processor that the callout server will
        never send data-as-is message for the specified data
        range. This message can only specify data fragments
        previously marked with the "copied" flag in a
        <data-have> message from OPES processor. This
        message amid must match the <data-have> (producer)
        message amid, not the consumer amid. This optional
        message may help OPES processor to free its
        resources.

    <data-need amid offset size>
        Tells OPES processor to send the specified data
        segment to the callout server (probably in response
        to data-pause message from the callout server). This
        message amid must match the corresponding producer
        amid, not the consumer amid.

    <data-pause amid>
        Notifies OPES processor that it should not send more
        data for this transaction until callout server
        explicitly asks for it using data-need message
        described above. This message amid must match the
        corresponding producer amid, not the consumer amid.

    <data-end amid reason>
        Tells OPES processor that there will be no more data
        for this amid (coming from the callout server)

    <consumer-end amid reason>
        Notifies callout server that there will be no more messages
        for this amid (coming from the callout server)

    <xaction-end xid reason>
        Notifies callout server that there will be no more messages
        for this transaction (coming from the callout server)


Note: There needs to be a way for callout server to tell
OPES processor to terminate (or short-circuit) the
forwarding of a message. This feature needs to be added to
the protocol, but it should not change the overall design.
One way to support this feature is for callout server to
change the destination of the application message from
consumer to producer (and change source to itself?).


    OPES processor or callout server may send the following
messages.

    <i-am-here>
    <i-am-here xid>
    <i-am-here xid amid>
        The messages tell recipient that the sender is
        working, working on xid, or working on amid,
        respectively. The sender may not be able to send any
        other message (yet), but wants to inform the
        recipient that it knows of recipient's (or xid, or
        amid) existence. The sender MAY send more specific
        messages later.



3. Examples

Here is an example of (not) filtering an HTTP message based
on HTTP headers:

        processor: <xaction-start xid1 services ...>
        processor: <producer-start xid1 amid11 source destination>
        processor: <data-have amid11 offset=0 size=headers copied>
        processor: <data-pause amid11>

        server: <consumer-start xid1 amid12 source destination >
        server: <data-as-is amid12 offset=0 size=all>
        server: <xaction-end xid1 "end-of-HTTP-message">

Note that xaction-end implies consumer-end implies data-end, and
there is no reason for OPES processor to send a xaction-end
message to server if the server already sent xaction-end message.
The lines above are grouped about possible network I/O
boundaries; thus, only two network data packets may be required
to process a message if the callout server decides it does not care
based on the headers.


Here is an example of redirecting an HTTP request by changing its
destination info and corresponding HTTP headers:

        processor: <xaction-start xid2 services ...>
        processor: <producer-start xid2 amid21 source destination>
        processor: <data-have amid21 offset=0 size=headers copied>
        processor: <data-pause amid11>

        server: <consumer-start xid2 amid22 source other-destination >
        server: <data-have amid22 offset=0 size=new-headers>
        server: <xaction-end xid2 "end-of-HTTP-message">


Finally, here is an example of modifying the "middle" part of
HTTP message body. The callout server switches the message encoding
to chunked, to avoid buffering data to figure out new Content-Length.

        processor: <xaction-start xid3 services ...>
        processor: <producer-start xid3 amid31 source destination>
        processor: <data-have amid31 offset=0 size=headers copied>
        processor: <data-pause amid11>

        server: <consumer-start xid3 amid32 source destination >
        server: <data-have amid32 offset=0 size=new-headers>
        server: <data-wont-need amid31 offset=0 size=headers>
        server: <data-need amid31 offset=headers size=all>

        processor: <data-have amid31 offset=headers size=chunk1 copied>

        server: <data-as-is amid32 offset=headers size=chunk1>

        processor: <data-have amid31 offset=chunkOff1 size=chunk2 copied>

        /* send modified chunk, tell processor to ignore the original */
        server: <data-have amid32 offset=newheaders+chunk1 size=chunk2mod>
        server: <data-wont-need amid31 offset=chunkOff1 size=chunk2>

        processor: <data-have amid31 offset=chunkOff2 size=chunk3 copied>
        processor: <data-end amid31 "end-of-HTTP-message">

        server: <data-as-is amid32 offset=chunkOff2 size=chunk3>
        server: <xaction-end xid3 "end-of-HTTP-message">

Note that once the flow starts, there are no explicit synchronization
points or waiting. The above message order is not the only one
possible: most messages from the processor are not synchronized with
most messages from the server.


4. Transport connections

OPES transport connections would depend on the transport
protocol (HTTP, BEEP, etc.). It is important to note that
regardless of the transport protocol chosen, it is possible
to multiplex messages from the OPES processor (or from the
server) over several persistent connections. OPES messages
do not depend on "connection" properties except for the
basic requirement that order-dependent messages use the same
transport connection, in the right order.


5. Synchronization and error handling.

The protocol has very few explicit dependencies between messages.
It is trivial to imagine a case where incorrect processor or
server implementation would result in deadlocks or other bad
states.  All sorts of deadlocks are resolved using timeouts. If
there is no progress with the transaction for an
admin-configurable time, the transaction is aborted. Aborting at
callout server side is easy:

        server: <xaction-end xid3 "deadlock">

On the processor side, specific actions would depend on the
protocol and state. For example, if no response bytes have been
sent to an HTTP client yet, then an error response can be sent.

It would be also possible, in some states, to eliminate OPES
server from processing if it fails. Supporting this behavior
would require having a copy of entire application messages even
is callout server tells us it does not need a copy. The exact
behavior must be admin-configurable.


6. About this document

This document goal is to become a section in the future OPES
protocol specs, after a lot of editing. The OPES milestone
reads: "MAY 03 Initial protocol document for OPES services
including their authorization, invocation, tracking, and
enforcement of authorization".


7. TODO

        1. Decide whether callout server can change
        application message source and destinations

        2. Understand and support the following: "It should
        be possible to indicate that the transmitted data
        comes from several places in the amid.  This allows
        the OPES processor to omit huge cookies and other
        junk; the response, by including this information,
        helps the process limit the state and parsing."

        3. Document how one can write OPES extensions. Use
        "progress meter" as an example/motivation.

$Id: protocol.txt,v 1.2 2003/02/21 21:25:14 rousskov Exp rousskov $