Draft on Callout Protocol Requirements

Hi,

we just submitted attached draft on "Requirements for OPES CalloutProtocols" as an attempt to get a discussion on callout protocolrequirements going. Please post any comments and feedback to this list.


-Markus



        Internet Draft                                  A. Dracinschi Sailer  
        Expires: May, 2002                               Lucent Technologies 
                                                                     V. Hilt 
        Document:                                          Univ. of Mannheim 
        draft-dracinschi-opes-callout-requirements.txt            M. Hofmann 
                                                         Lucent Technologies 
                                                                 R. R. Menon  
                                                                       Intel 
                                                                            
        Category: Informational                            November 14, 2001 
         
         
         
         
                       Requirements for OPES Callout Protocols 
         
         
     Status of this Memo 
         
        This document is an Internet-Draft and is in full conformance with 
        all provisions of Section 10 of RFC2026. 
         
        Internet-Drafts are working documents of the Internet Engineering 
        Task Force (IETF), its areas, and its working groups. Note that 
        other groups MAY also distribute working documents as Internet-
        Drafts. 
         
        Internet-Drafts are draft documents valid for a maximum of six 
        months and MAY be updated, replaced, or obsoleted by other documents 
        at any time. It is inappropriate to use Internet-Drafts as reference 
        material or to cite them other than as "work in progress." 
         
        The list of current Internet-Drafts can be accessed at 
             http://www.ietf.org/ietf/1id-abstracts.txt 
        The list of Internet-Draft Shadow Directories can be accessed at 
             http://www.ietf.org/shadow.html. 
         
         
     Abstract 
         
        In the context of the Content Networks, the Open Pluggable Edge 
        Services represents an infrastructure that enables quick and easy 
        creation of value-added networking services. This document attempts 
        to present requirements for callout protocols that provide 
        communication between an in-path OPES intermediary (e.g. a cache) 
        and remote callout servers. 
         
         
     Table of Contents 
         
        1  Terminology....................................................2 
        2  Introduction...................................................2 
        3  Design Considerations..........................................3 
        3.1  Basic Requirements...........................................3 
        Dracinschi              Expires MAY 2002                   [Page 1] 
        Internet Draft   Callout Protocol Requirements       November 2001 
        3.1.1 Service identification......................................3 
        3.1.2 Message exchange style......................................3 
        3.1.3 Message context.............................................3 
        3.1.4 Payload transparency........................................4 
        3.1.5 Pipelining requests.........................................4 
        3.1.6 Message segmentation........................................5 
        3.2  Increasing Efficiency........................................5 
        3.2.1 Caching responses...........................................5 
        3.2.2 Channels....................................................5 
        3.2.3 Buffering messages..........................................6 
        3.2.4 Preview.....................................................6 
        3.2.5 Partial content.............................................7 
        3.2.6 Multiple services on the same message.......................8 
        4  Security Considerations........................................9 
        5  Acknowledgments................................................9 
        6  References.....................................................9 
        7  Author's Addresses.............................................9 
        Full Copyright Statement..........................................10 
      
         
     1  Terminology  
         
        The key words "MUST", "MUST NOT", "REQUIRED", "SHALL ", "SHALL NOT", 
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
        document are to be interpreted as described in RFC 2119 [1].  
         
        OPES related terms are to be interpreted as defined and used in [2]. 
      
      
     2  Introduction 
         
        Content Networks, also known as Content Distribution Networks or 
        Content Delivery Networks (CDNs), are of increasing importance to 
        the overall architecture of the web. CDNs support improving the 
        delivery of content from an origin server to content consumers. 
        Content networks can be seen as an overlay network on top of the 
        traditional packet network infrastructure. Similar to the CDN space, 
        there exists need for delivering a variety of services to 
        corporate/enterprise Intranets. [2] introduces Open Pluggable Edge 
        Services (OPES), an infrastructure for adding valuable content 
        services to a CDN or an Intranet. Examples of such services include 
        dynamic content assembling at the network edge, URL filtering, 
        language translation, location-based services, content adaptation 
        for different devices based on device characteristics, privacy 
        services, etc. 
         
        This document presents requirements for callout protocols in the 
        context of the OPES architecture. A callout protocol supports 
        message exchanges between an in-path OPES intermediary and a remote 
        callout server. Intermediaries are application gateway devices 
        located in the path between a client and an origin server. Caching 
        proxies are probably the most commonly known and used intermediaries 
        Dracinschi, et. al.     Expires MAY 2002                   [Page 2] 
        Internet Draft   Callout Protocol Requirements       November 2001 
        today. A remote callout server is a cooperating server that runs 
        OPES service modules on behalf of an OPES intermediary. Remote 
        callout servers are usually employed in an OPES framework to either 
        offload the OPES intermediary for better scalability or to provide 
        value-added services not available on either the origin server or 
        the OPES intermediary.  
      
        Section 3 describes the attempts to summarize the requirements for 
        such callout protocol.  
         
         
     3  Design Considerations 
      
     3.1 Basic Requirements  
         
        A callout protocol's primary purpose is to efficiently forward, from 
        the intermediary to the remote callout server, request/response 
        messages exchanged on the content path (e.g. HTTP, RTSP, or RTP 
        messages) and information about the service to be executed on those 
        messages at the remote server. In order to fulfill this task, a 
        callout protocol SHOULD consider the following design issues: 
        service identification, message exchange style, message context, 
        payload transparency, pipelining and message segmentation. 
         
     3.1.1   Service identification 
         
        A callout protocol MUST be able to uniquely identify a remote 
        callout service that is required to be executed on a message. An 
        adequate way to provide such identification MAY be a URI. Such a URI 
        MUST contain the complete hostname and the path identifying the 
        service requested. The method of determining the name of an 
        appropriate service is outside of the scope of a callout protocol. 
        An example for a URL is ucp://my.callout-server.com/service1 
      
      
      
     3.1.2   Message exchange style 
         
        A callout protocol MUST implement a request/reply communication 
        style. Initiating a callout always requires a request containing the 
        encapsulated message (or parts of it) to be transferred to a callout 
        server. In turn, this server MUST always send back a response either 
        containing the unmodified message, a modified version of the 
        message, a status code (that triggers a certain reaction from the 
        intermediary) or an error code.  
         
     3.1.3   Message context 
         
        Some remote callout services require additional information to 
        perform their service. One example for such information is the HTTP-
        request for a service that is operating on a HTTP-response. Another 
        example is a command line parameter (e.g. the destination language 
        for a translation service). In general, a message context could be 
        any information available in the local execution environment that is 
        needed by a remote callout service.  
        Dracinschi, et. al.     Expires MAY 2002                   [Page 3] 
        Internet Draft   Callout Protocol Requirements       November 2001 
         
        Basically, there are two methods of transferring the message context 
        to the remote server: first, it can be part of the URL (e.g. as user 
        id, additional path elements or a query parameter) with which a 
        service is invoked. An example of such a URL is 
        ucp://volker(_at_)my(_dot_)callout-server(_dot_)com:8080/translation-
        service/fast_translation?lang=german. The second possibility to 
        transfer the message context is within a separate field of the 
        request header. As with the payload, no assumptions SHOULD be made 
        on the type or structure of the message context field. Instead, the 
        message context SHOULD be taken as binary data that is encapsulated 
        in the request. An example for information in such a header field is 
        the HTTP-request that is shipped along with a HTTP-response.  
         
        Both methods of transferring the message context have their 
        advantages and disadvantages. Transferring the message context 
        within the URL is simple and produces very low overhead. However, 
        the size and complexity of information contained in a URL is limited 
        (e.g. encoding a HTTP-request within a URL might not be a good 
        alternative). Using a separate header-field introduces some overhead 
        but is much more flexible than using a URL.  
         
        Although it would be possible to let the callout server modify parts 
        of the message context and return it along with the response, this 
        SHOULD NOT be allowed. It would substantially increase the 
        complexity of an intermediary since the intermediary would need to 
        assure the consistency of the message context especially if multiple 
        requests are issued in parallel.  
         
     3.1.4   Payload transparency 
         
        A callout protocol SHOULD make no assumptions about the protocol 
        used on the content path (in particular, it SHOULD NOT assume that 
        this protocol is HTTP). Instead, a callout protocol SHOULD take the 
        content path protocol messages as binary data and encapsulate these 
        messages during the transfer to and from a remote callout server.  
         
        This requirement does not prevent a design, where a basic callout 
        protocol captures common aspects of the callout process and an 
        additional payload specification tailors this basic protocol to the 
        needs of a certain content path protocol (similar to the model used 
        by RTP). Nevertheless, the basic callout protocol SHOULD be 
        independent of the protocol used on the content path.  
         
        If possible, a callout protocol SHOULD also not assume a certain 
        communication pattern (e.g. request/reply) to be used on the content 
        path. The rationale behind the payload transparency is, that a 
        callout protocol SHOULD be capable of handling different content 
        path protocols to avoid the re-implementation of similar 
        functionality for each of these protocols. Examples of common 
        content path protocols are HTTP, RTSP, SMTP, NNTP, and RTP. 
      
     3.1.5   Pipelining requests 
         
        Dracinschi, et. al.     Expires MAY 2002                   [Page 4] 
        Internet Draft   Callout Protocol Requirements       November 2001 
        It is very likely that a remote callout service is called many times 
        in sequence with a very short time in between two single requests. 
        For example an ad insertion service might be called for every HTTP 
        message passing through an intermediary. For this reason, a callout 
        protocol MUST be capable of issuing a request without having 
        received the response for a previous request. In other words, the 
        protocol MUST be capable of pipelining multiple requests.  
         
     3.1.6   Message segmentation 
         
        The messages exchanged on the content path can be of very large 
        sizes. Examples are huge web pages, PostScript or PDF documents, 
        audio and video clips and streamed audio and video. Usually, these 
        messages are segmented and transferred in a stream of small packets. 
        For example, HTTP supports this type of transmission with its 
        chunked transfer encoding. A callout protocol SHOULD be able to 
        redirect the segments of a message to the callout server as soon as 
        the intermediary receives them. The intermediary SHOULD NOT try to 
        receive the entire message before it is sent to the callout server. 
        This would substantially increase the processing time of one message 
        and it would not be possible at all for media streams. An 
        implication for the protocol design is that the size of messages is 
        not known at the time the first packets are sent to the callout 
        server. 
         
     3.2 Increasing Efficiency 
         
        Typically, an intermediary has to handle large amounts of network 
        traffic. Depending on the rule configuration and the services 
        provided, a significant part of this traffic may be sent to a remote 
        callout server. For this reason, efficiency SHOULD be one of the 
        major design goals for a callout protocol. Performance measurements 
        on the ICAP protocol indicate that the vast majority of processing 
        time is spent copying messages from the content path to the callout 
        server and back. Thus, the efficiency of a callout protocol can be 
        increased if the amount of data that has to be transmitted is 
        minimized. The following concepts MAY help to achieve this goal. 
         
     3.2.1   Caching responses 
         
        A callout protocol SHOULD support the caching of responses. To do 
        so, a remote callout server MUST be able to indicate if and how long 
        a response MAY be cached by an intermediary. If a response is 
        cacheable and still valid, an intermediary MAY satisfy identical 
        requests by using the cached response. Determining which requests 
        are identical is outside of the scope of a callout protocol. If a 
        server has allowed the caching of a response for a certain period of 
        time, there is no means for it to revise this decision. 
         
     3.2.2   Channels 
         
        Since it can be assumed that an intermediary sends a large number of 
        requests to a remote callout server, it is reasonable to open a 
        persistent channel to a remote callout server over which all 
        messages are transferred. This will substantially reduce the network 
        Dracinschi, et. al.     Expires MAY 2002                   [Page 5] 
        Internet Draft   Callout Protocol Requirements       November 2001 
        overhead for the transmission of one message. An intermediary might 
        decide at which time it opens or closes a channel. A reasonable 
        policy might be to establish a channel at the time the first request 
        for a service is received and to close the channel after a timeout. 
        The policy of opening and closing a channel SHOULD NOT be part of 
        the protocol. 
         
        During the creation of a channel, an intermediary has the chance to 
        negotiate service parameters, associated with that channel, with the 
        remote callout service. These parameters apply to all messages 
        exchanged over that channel. Examples of such parameters are the 
        service URI, the payload type, or the service context. Exchanging 
        this information once at the channel setup reduces some of the 
        protocol overhead. Although these savings are not really big, they 
        come at almost no cost. Furthermore, negotiation of parameters can 
        be accomplished during channel creation while this might become 
        time-critical if attempted for each message. 
         
     3.2.3   Buffering messages 
         
        An intermediary MAY keep a local copy of the message it has sent to 
        a remote callout server. This allows the callout server to avoid 
        returning an entire message always. The server could, for example, 
        return a status code indicating that it does not want to alter the 
        original message. Keeping a copy of the message at the intermediary 
        can significantly decrease the amount of data that has to be 
        transferred between intermediary and callout server. However, it 
        requires the intermediary to store and manage all messages it has 
        sent to the callout server. Thus, it introduces complexity in the 
        intermediary and increases its memory requirements.  
         
        To alleviate this problem, the intermediary could specify the amount 
        of data it is willing to buffer for one request. If this limit is 
        reached, the intermediary will stop the transmission of the request 
        and will wait for a response. Up to that point, the server is 
        allowed to respond at any time and assume that the intermediary has 
        kept the entire message. If the server is not able to determine a 
        response from the initial part of the request, then it MUST 
        explicitly request the transmission of the remaining part of the 
        request. The next response MUST assume that the intermediary does 
        not have a copy of the message. 
         
     3.2.4   Preview 
         
        In some cases, the remote callout service can complete its operation 
        before it has received the entire message. For example, a virus 
        checking service can certify a large fraction of all files as 
        "clean" just by looking at the file type and the first 2K bytes. 
        Another example is a content filtering system that marks a web page 
        as containing "illegal content" as soon as certain words appear in 
        that page. In these cases, the remote callout server does not need 
        to receive the remaining part of the message and can instantly 
        respond with a certain status code. A callout protocol SHOULD 
        provide the possibility for a server to opt out of a transmission 
        early. 
        Dracinschi, et. al.     Expires MAY 2002                   [Page 6] 
        Internet Draft   Callout Protocol Requirements       November 2001 
         
        Basically, there are a two of design alternatives for the preview 
        functionality: In the first approach, the intermediary sends a pre-
        defined portion of the request to the callout server, then stops and 
        waits for a response from the callout server. If the server returns 
        a positive response, the intermediary sends the remaining part of 
        the message. Otherwise it interrupts the transmission. This approach 
        is used by the ICAP protocol. In the second approach, the callout 
        server is allowed to respond to a request at any time. It MUST 
        indicate in this response if the current transmission SHOULD be 
        completed or interrupted.  
         
        A prerequisite for the first approach is that the intermediary knows 
        the amount of data required by the server to decide on continuing or 
        interrupting a request. In these cases the intermediary can send 
        exactly this portion of a request and thus minimize the amount of 
        data that is exchanged. A drawback of this approach is that the 
        handshake between intermediary and callout server introduces an 
        additional delay into the processing of one request. The major 
        advantage of the second approach is that it lets the server decide 
        at which point the transmission is interrupted. This can be 
        exploited, for example, by services that make their decision on 
        continuing or interrupting dynamically during the processing of one 
        request. In these cases, the second approach is more efficient, 
        since it allows the server to opt out of the transmission as soon as 
        possible. Summing up, in the ideal case the first approach is used 
        if the size of the preview is known in advance and the second 
        approach is used otherwise.  
         
        If only one approach SHOULD be supported by a callout protocol, the 
        penalty for not using the optimal approach MUST be considered. If 
        the second approach is used in any case, the intermediary continues 
        sending data after the decision point until it receives a response 
        from the server. If the response is to continue the transmission, no 
        bandwidth has been wasted and, in addition, no delay for the 
        handshake has been introduced. If the response is negative, the 
        intermediary has sent redundant data for the time of one message 
        round trip. If the first approach is used in any case, the 
        intermediary MUST guess the size of the preview. If the chosen size 
        is too large and the server decides to bail out of a transmission, 
        the penalty is the data that is transmitted until the full preview 
        size is reached. If the guess of the preview size was too small, the 
        intermediary MUST continue and send the entire message. Thus, the 
        penalty is the part of the message after the actual decision point. 
        In conclusion, the penalty using the first approach in any case is 
        typically higher than the penalty of always using the second 
        approach. 
         
     3.2.5   Partial content 
         
        Some remote callout services only modify small parts of the original 
        message. For example, a translation service typically inserts a 
        small icon into the original page, from which the translated page 
        can be reached. Another example is a service that forces all 
        cacheable data to expire at a certain time by modifying the HTTP 
        Dracinschi, et. al.     Expires MAY 2002                   [Page 7] 
        Internet Draft   Callout Protocol Requirements       November 2001 
        header fields. In these cases, returning the entire message from the 
        callout server back to the intermediary would not be very efficient. 
        Instead, a remote callout server could just return the modified 
        parts of a message and indicate the position at which this part MUST 
        be inserted into the original message.  
         
        This is much like a partial content response of HTTP. It is 
        important to keep the burden on the intermediary as low as possible. 
        For this reason, the response SHOULD always indicate the offset of 
        the partial response in absolute byte numbers. Basically this 
        approach trades an increase of complexity in the callout protocol 
        and the intermediary against a decrease in the amount of data that 
        has to be transmitted. Although the additional complexity seems to 
        be relatively low, the benefits heavily depend on the remote callout 
        services that are able to utilize this feature. 
         
     3.2.6   Multiple services on the same message 
         
        A remote callout service provider might offer several callout 
        services. In this case, it might not be reasonable to make a 
        separate call for each remote service to be executed on the same 
        content-path message. Instead, it would be more efficient to 
        transfer the content-path message to the remote callout server once, 
        execute all services and return the entire response. The callout 
        server is responsible for dispatching the message in the correct 
        order to the different services and for aggregating the responses 
        into a single response message. 
         
        To invoke multiple services, an intermediary MUST be able to specify 
        more than one URL. The design alternatives are to set up one channel 
        for each combination of remote services or to use one channel to a 
        callout server and specify the desired URLs in each message.  
         
        The most challenging task is to dispatch the requests to multiple 
        services and to aggregate the responses of individual services. This 
        SHOULD be done by a dispatcher on the remote server. Thereby, the 
        following rules can be considered: 
         
        Caching: the response MUST contain the earliest expiration date. 
         
        Keeping copy: the remote callout server SHOULD propose the maximum 
        of the prefix sizes of individual services as the prefix size of the 
        compound service.  
         
        If a service requests the transmission of the entire message, the 
        server MUST return this request to the intermediary and forward the 
        remaining message to the service. This request frees the 
        intermediary from the burden of keeping a copy of the message. If 
        the server itself is not willing to buffer the message, it MUST call 
        all subsequent services with preview size zero. In any case, the 
        server MUST return an entire message to the intermediary. 
        Preview: if the response of a service indicates that no changes are 
        required, the service dispatcher SHOULD NOT opt out of the current 
        transmission of the request. Instead, it SHOULD forward the current 
        message to the next service. Only if all services indicate that no 
        Dracinschi, et. al.     Expires MAY 2002                   [Page 8] 
        Internet Draft   Callout Protocol Requirements       November 2001 
        changes are required and the message still has not been transmitted 
        completely, the service dispatcher MAY interrupt this transmission 
        and return a "no changes required" response.  
         
        Partial content: the message dispatcher of the callout server MUST 
        insert the partial response it receives from each service into the 
        full message before sending it to the next service. If all services 
        have returned partial responses, it MAY decide to aggregate all 
        parts and return as a partial response to the intermediary. 
        Otherwise it returns the response it got from the last service 
        called as an entire message.  
         
         
     4  Security Considerations 
      
        This document does not explicitly require a callout protocol to 
        encrypt the encapsulated content-path messages for transit by 
        default. In the absence of some other form of encryption at the link 
        or network layers, eavesdroppers may be able to record the 
        unencrypted transactions between the intermediary and the callout 
        server. 
         
         
     5  Acknowledgments  
                
        The authors would like to thank all active participants in the OPES 
        mailing list for their thought-provoking discussion. In particular, 
        we want to acknowledge major contributions from Andre Beck, who was 
        heavily involved in shaping this document.  
                     
                     
     6  References 
         
        [1]  S. Bradner. RFC 2119. "Key words for use in RFCs to Indicate 
             Requirement Levels", March 1997 
         
        [2]  Tomlinson, G., et al. "A Model for Open Pluggable Edge     
             Services", Work in Progress, Internet Draft draft-tomlinson-
             opes-model-00.txt, July 2001.  
         
         
     7  Author's Addresses  
      
        Anca Dracinschi Sailer 
        Room 4F-531 
        Lucent Technologies  
        101 Crawfords Corner Rd.  
        Holmdel, NJ 07733  
        Phone: (732) 494-2259  
        Email: anca(_at_)bell-labs(_dot_)com  
         
         
        Volker Hilt  
        Praktische Informatik IV                                       
        University of Mannheim                                        
        Dracinschi, et. al.     Expires MAY 2002                   [Page 9] 
        Internet Draft   Callout Protocol Requirements       November 2001 
        Phone: +49 621 181 2606 
        Email: hilt(_at_)informatik(_dot_)uni-mannheim(_dot_)de 
         
        Markus Hofmann  
        Room 4F-513 
        Lucent Technologies  
        101 Crawfords Corner Rd.  
        Holmdel, NJ 07733  
        Phone: (732) 332-5983  
        Email: hofmann(_at_)bell-labs(_dot_)com  
         
        Rama R. Menon  
        Intel Corporation  
        M/S JF3-206  
        2111 NE 25th Ave.  
        Hillsboro, OR 97124  
        Phone: +1-503-712-1438  
        Email: rama(_dot_)r(_dot_)menon(_at_)intel(_dot_)com  
      
         
     Full Copyright Statement  
         
        Copyright (C) The Internet Society (2000). All Rights Reserved.  
          
        This document and translations of it MAY be copied and furnished to 
        others, and derivative works that comment on or otherwise explain it 
        or assist in its implementation MAY be prepared, copied, published 
        and distributed, in whole or in part, without restriction of any 
        kind, provided that the above copyright notice and this paragraph 
        are included on all such copies and derivative works. However, this 
        document itself MAY not be modified in any way, such as by removing 
        the copyright notice or references to the Internet Society or other 
        Internet organizations, except as needed for the purpose of 
        developing Internet standards in which case the procedures for 
        copyrights defined in the Internet Standards process MUST be 
        followed, or as required to translate it into languages other than 
        English.  
          
        The limited permissions granted above are perpetual and will not be 
        revoked by the Internet Society or its successors or assigns.  
         
        This document and the information contained herein is provided on an 
        "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 
        TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 
        BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 
        HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 
        MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 
         
      
      
      
      
        Dracinschi, et. al.     Expires MAY 2002                  [Page 10]