Network Working Group M. Hamilton Internet-Draft JANET Web Cache Service Expires: December 30, 2001 I. Cooper Equinix Inc. D. Li Cisco Systems, Inc. July 2001 Requirements for a Resource Update Protocol draft-ietf-webi-rup-reqs-01.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 30, 2001. Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved. Abstract This document seeks to establish the requirements for a Resource Update Protocol which may be used in conjunction with World-Wide Web intermediary systems such as caching proxies and surrogate servers (proxy accelerators) to facilitate cache coherence and interoperability. It is envisaged that RUP will include invalidation of previously cached objects as a key feature, while providing hooks for future extensions to richer functionalities. The main goal is to enable proxy caching and content distribution of large amounts of Hamilton, et. al. Expires December 30, 2001 [Page 1] Internet-Draft RUP Requirements July 2001 frequently changing web objects, where periodically revalidating objects one by one is unacceptable in terms of performance and/or cache consistency. Revision Log: 1. state that RUP requirements focus on cache invalidation, and not on content retrieval or content push. 2. state that RUP should strive to provide hooks so that future inclusion of richer functionality is possible. 3. merge "scoping requirement" into "functional requirement". 4. classify various functional requirements into five areas. 5. describe the requirements for strong and weak cache consistency. 6. state that both server and client may initiate message exchange. 7. state that resource grouping is decided outside of RUP and not negotiable dynamically. 8. state that the protocol semantics and message formats should be self-contained and portable. 9. state that use cases include internet proxy, surrogates, CDNs, intranet proxy/surrogate, and non-intermediary uses. 10. remove "content update" from the use cases. 11. switch the sections of the use cases and the functional requirements. 12. acknowledge people who've given comments on the mailing list or at the IETF meetings. Hamilton, et. al. Expires December 30, 2001 [Page 2] Internet-Draft RUP Requirements July 2001 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5. Functional Requirements . . . . . . . . . . . . . . . . . . . 9 5.1 Network and Host Environment . . . . . . . . . . . . . . . . . 9 5.2 Inter-box Communication . . . . . . . . . . . . . . . . . . . 9 5.3 Client-Server Interaction . . . . . . . . . . . . . . . . . . 9 5.4 Naming and Framing . . . . . . . . . . . . . . . . . . . . . . 10 5.5 Coherence Model . . . . . . . . . . . . . . . . . . . . . . . 11 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 14 Full Copyright Statement . . . . . . . . . . . . . . . . . . . 16 Hamilton, et. al. Expires December 30, 2001 [Page 3] Internet-Draft RUP Requirements July 2001 1. Introduction A number of cache coherence or cache invalidation protocols have been proposed by the research community and the caching and content distribution industry. Approaches vary, with some proponents seeking to enhance existing protocols, and others developing new protocols either specifically for this purpose - or which include this functionality. Examples include WCIP [1], PSI [2] and DOCP [3]. A carefully developed mechanism for the communication of information about changes to Internet resources offers the potential for other functions above and beyond invalidation of cached objects. More general applications for this mechanism might include automated tracking of changes to related groups of resources through 'channel' subscriptions for real-time 'mirroring' of collections of resources, and the sharing of information about cached objects between intermediaries from different vendors. Resource updates may also be an appropriate way of informing systems which generate content dynamically that the underlying data which they manipulate (e.g. to produce HTML pages) has changed. The IETF's Web Intermediaries working group (WEBI) has been chartered to develop a Protocol based on requirements to be gathered here. For the reasons described above, we will refer to an abstract Resource Update Protocol (RUP, or simply 'the protocol') whose functionality will initially be limited to simple invalidation of cached objects, but also provide a basis for future extensions to richer functionality. Note that RUP is at least conceptually a new protocol, but may in practice be based wholly or partly on existing protocols. Hamilton, et. al. Expires December 30, 2001 [Page 4] Internet-Draft RUP Requirements July 2001 2. Terminology This document uses terms defined and explained in the WREC Taxonomy [4], and the HTTP/1.1 specification [5]. The reader should be familiar with both of these documents. In this document, the term "surrogate" is shorthand for a demand- driven surrogate origin server, unless explicitly stated otherwise. Similarly, "origin server" refers to a surrogate's master origin server. Cache coherence and invalidation is discussed in detail in the caching literature, e.g. see [7] and [8] for background information. Hamilton, et. al. Expires December 30, 2001 [Page 5] Internet-Draft RUP Requirements July 2001 3. Design Goals 1. The protocol must be simple and extensible, and it should be possible to use it to transport unforeseen payloads without breaking existing implementations. The need for an extension mechanism (even in a purposely simple protocol) and the messy consequences of not providing this have been seen in a number of widely implemented and deployed protocols, e.g. syslog, with its hard coded priority and facility code bitfields. 2. The protocol must itself be widely deployable on the Internet, and should leverage existing technologies (e.g. XML, HTTP, URIs) as much as possible. This means that installed base and developer experience can be exploited, thus reducing the cost of entry to new implementors and would-be deployers of the protocol. Where work is being proposed in an area where there are existing mature technologies, this work must be justified in comparison with the work involved in simply re-using the existing technology. 3. The protocol should be easy to integrate into applications such as content management engines and Web server-side software components. At the time of writing, proprietary resource update protocols were in use in some commercial systems. The IETF's Resource Update Protocol should be capable of being used in this role, so as to facilitate open content exchange. Hamilton, et. al. Expires December 30, 2001 [Page 6] Internet-Draft RUP Requirements July 2001 4. Use Cases Please note that the protocol level details discussed here are only hypothetical at this stage, but necessary to support the examples. It's anticipated that the use of RUP will be among internet proxy, surrogates, CDNs, intranet proxy/surrogate, and non-intermediary uses. 1. Server-driven invalidation: in this scenario the RUP server would send object or resource group invalidations to the RUP clients, sending invalidation signals according to its own scheduling configuration. The connection between client and server could be established by either party, and could be persistent - so as to facilitate monitoring of the update guarantee through heartbeat packets. If an automated discovery mechanism was used to let clients detect servers (or vice versa), this would have security concerns which would need to be addressed. 2. Client-driven validation: in this scenario the RUP client would take the lead, querying the RUP server for the freshness status of an object or group of objects (denoted by a URI). The RUP server would reply with the latest changes since the last time the client asked - based on information such as Etag, timestamp, and/or version number. Whether and when the client asks the server is determined by the consistency guarantee the client is committed to provide, and should follow the semantic rules defined by the RUP protocol. The URI of a particular group of resources could be manually configured, sent as header information in the HTTP responses from the origin server, or distributed via a separate out-of-bound mechanism. 3. Content update redirects: in this scenario the RUP server would, besides the cache invalidation, use "update redirects" - notifying the RUP clients that an object is to be updated and that the full update is to be fetched from another source, e.g. a regular web server, the parent caching proxy, or a multicast object distribution channel. In this particular example there are related efforts which could be leveraged, such as SDP and SIP. A use case that RUP will not address in its first standard proposal is "content updates". In this scenario, the RUP server would, instead of sending a cache invalidation and/or update redirect signal to the client, send the RUP client with either the full content of a modified object or a delta update showing changes against the previous revision. The reason for not supporting it right now is two fold. Hamilton, et. al. Expires December 30, 2001 [Page 7] Internet-Draft RUP Requirements July 2001 First, relative to cache invalidation, there's much less understanding of the kinds of content updates RUP may need. In particular, mixing signaling with data leads to problems including scaling, object consistency and security issues that are not well understood. Second, there are existing mechanisms addressing content retrieval, e.g., HTTP [5] and Delta Encoding [6], which also demonstrate the high complexity of such a functionality. RUP, however, will provide hooks for "content updates", i.e., through "content update redirects" (see use case 3). This allows RUP to leverage, instead of reinventing, existing mechanisms. Hamilton, et. al. Expires December 30, 2001 [Page 8] Internet-Draft RUP Requirements July 2001 5. Functional Requirements 5.1 Network and Host Environment The protocol should be useable both in a surrogate/origin server relationship and a traditional caching proxy/origin server relationship. The protocol should also be general enough to be useable in content delivery network (CDN) environments to allow freshness control of CDN delivery nodes. This will provide proxies with a low latency mechanism for cache coherence, obviating the need for cumbersome proxy revalidation. It must be possible for the protocol to be used in an environment where some or all communications are mediated through a firewall and/or Network Address Translation (NAT) device. The protocol design must identify issues involved in firewall/NAT traversal and provide ways by which these may be avoided or circumvented. These may not be explicitly security related concerns, e.g. working around any problems caused by use of Network Address Translation. 5.2 Inter-box Communication The protocol should leverage existing technologies (e.g. XML, HTTP, URIs) as much as possible. The protocol should layer cleanly and independently on top of the underlying communication layers, e.g., TCP, HTTP, BEEP, or SOAP, so that the protocol semantics and message formats are self-contained and easily transplanted. The protocol should make possible that the information transferred between (for example) origin server and surrogate can be authenticated and if necessary encrypted. This is an area where off- the-shelf solutions exist such as TLS and SASL - the developers of the protocol will need to determine how best to make use of these. A mechanism providing for discovery of channels may be desirable, if a channel based model is adopted. This should not preclude or be a pre-requisite for development of the protocol per se - entities supporting RUP must be capable of being configured by hand too. 5.3 Client-Server Interaction The protocol must define a client/server relationship. It should be possible for either the server or the client to initiate a round of update message exchange. We anticipate that the primary RUP clients and servers will be Hamilton, et. al. Expires December 30, 2001 [Page 9] Internet-Draft RUP Requirements July 2001 intermediaries (speaking the HTTP protocol) and origin servers, although the protocol should not be so designed as to preclude use by other entities. For example, the origin server or servers may delegate the role of RUP server to a CDN which operates dedicated content signaling channels and servers. The protocol should be designed to scale to systems where there are a large number (more than 10,000) surrogates of a given origin server. This can be done either directly or through multiple levels of intermediary relay points. The protocol should be capable of operating efficiently on a wide variety of underlying media, high latency satellite links in particular will need to be considered. The protocol must allow for the integration of commonly accepted standards for authentication, authorization and encryption. It must be possible to determine whether resource update messages have been missed, e.g. due to a client or server being down or unreachable. There must be a feedback mechanism which enables the origin server to determine the extent to which resource updates have propagated to surrogates. It should be possible to replay or batch updates so that a consistent state is reached on all surrogates of a given origin server and collection of resources - i.e. update can effectively be guaranteed in a group of cooperating RUP clients and servers if they are prepared to work to achieve it. 5.4 Naming and Framing The protocol must enable the communication regarding an arbitrary group of resources, identified by unique URIs. The protocol must be able to invalidate multiple resources with one message. It must be possible to group resources together under some unique identifier such as a URI, which can be widely shared by a content provider with its surrogates. RUP resource group URIs can be designed to be unique, whereas URIs in the more general sense may not be. The grouping of resources is done outside of RUP, e.g., by the content provider, CDN operator, or traffic analysis tools. RUP is not requires to provide dynamic negotiation, between the RUP server and client, over the composition of a resource group. This is out of complexity and scalability concerns about servers (and clients) having to negotiate and maintain individual views of resource groups for all the clients (and servers) they speak to. It's anticipated that predefined resource groups or channels will fit well with the Hamilton, et. al. Expires December 30, 2001 [Page 10] Internet-Draft RUP Requirements July 2001 majority of the RUP deployment cases (surrogates, mirror sites, and CDNs). The protocol must define an extensible format for RUP messages which is capable of carrying a variety of payloads. Possible payloads include (1) cache invalidation, (2) content update redirects, (3) small object or delta updates, (4) arbitrary events for content management (e.g., in content peering). While the above payloads may share the same RUP mechanism, it's not a requirement for the initial protocol to address all of them simultaneously. Specifically, the initial RUP is required to only provide "cache invalidation" payload and "content update redirect" payload. 5.5 Coherence Model The protocol should make strong consistency possible, e.g., by returning positive acknowledgement upon receiving an invalidation message to signal the completion of cache invalidation or even content retrieval. Note that strong consistency may also be achieved using the expiration time in HTTP cache-control header, if the expiration time is known in advance. In cases where the expiration becomes known only after the content has been cached elsewhere, e.g., when scheduling the roll-out of a web site to production, the content provider may use RUP to invalidate previously cached copies so that when new copies are retrieved, it contains the roll-out time as the expiration time. The protocol should also make loose consistency available, for applications that do not require tight coupling, e.g., traditional batch mode mirroring applications. In particular, the protocol should allow the user to control the level of looseness, e.g., by specifying the best-case and/or worst-case latency for update delivery, and provide update guarantees that are based on the user's specification. Resource update guarantees must propagate correctly through the scaling mechanisms even if multiple levels of intermediary are used. It is essential that the protocol support revision control of updates, e.g. so that a surrogate can identify whether any updates the origin has notified it about are outstanding. Related efforts such as WEBDAV/DELTAV should be investigated, since they potentially provide an efficient bulk transfer system for the actual resource contents. Hamilton, et. al. Expires December 30, 2001 [Page 11] Internet-Draft RUP Requirements July 2001 6. Security Considerations Intermediaries open up a large number of new security problems which do not exist in the classical end-to-end model of the Internet, by introducing a 'Man In The Middle' by design. As such, it is essential that this protocol level work on intermediaries takes care to devise means by which the integrity of the resources being updated can be preserved - or at least tested. The major risks associated with the protocol should be quantified and specifically addressed by the protocol design. Hamilton, et. al. Expires December 30, 2001 [Page 12] Internet-Draft RUP Requirements July 2001 7. Acknowledgements Thanks to Mark Nottingham, Oskar Batuner, Mark Day, Phil Rzewski, Fred Douglis, Lisa Dusseault, Ted Hardie, Joe Touch, Brad Cain, Joseph Hui, Alex Rousskov, Mike Dahlin, Stephane Perret, Darren New, Renu Tewari, and the rest of the WEBI mailing list for their contributions. The JANET Web Cache Service is funded by the Joint Information Systems Committee of the UK Higher and Further Education Funding Councils (JISC). Hamilton, et. al. Expires December 30, 2001 [Page 13] Internet-Draft RUP Requirements July 2001 References [1] Li, D., Cao, P. and M. Dahlin, "WCIP: Web Cache Invalidation Protocol", draft-danli-wrec-wcip-00.txt (work in progress), November 2000. [2] Krishnamurthy, B. and C. Wills, "Piggyback server invalidation for proxy cache coherency", In Computer Networks and ISDN Systems, Volume 30 1998. [3] Dilley, J., Arlitt, M., Perret, S. and T. Jin, "The Distributed Object Consistency Protocol", Technical Report HPL-1999-109, September 1999. [4] Cooper, I., Melve, I. and G. Tomlinson, "Replication and Caching Taxonomy", RFC 3040, January 2001. [5] Fielding, R., Gettys, J., Mogul, J., Nielsen, H., Masinter, L., Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [6] Mogul, J., Krishnamurthy, B., Douglis, F., Feldmann, A., Goland, Y., van Hoff, A. and D. Hellerstein, "Delta encoding in HTTP", draft-mogul-http-delta-08.txt (work in progress), March 2001. [7] Belloum, A. and L. Hertzberger, "Maintaining Web cache coherency", In Information Research, Volume 6 No. 1, October 2000. [8] Gwertzman, J. and M. Seltzer, "World-Wide Web Cache Consistency", In Proceedings 1996 USENIX Technical Conference, January 1996. Authors' Addresses Martin Hamilton JANET Web Cache Service Computing Services Loughborough University Loughborough, Leics LE11 3TU UK Phone: +44 1509 263171 EMail: martin(_at_)wwwcache(_dot_)ja(_dot_)net Hamilton, et. al. Expires December 30, 2001 [Page 14] Internet-Draft RUP Requirements July 2001 Ian Cooper Equinix Inc. 2450 Bayshore Parkway Mountain View, CA 94043 USA Phone: +1 650 316-6065 EMail: icooper(_at_)equinix(_dot_)com Dan Li Cisco Systems, Inc. 170 W. Tasman Dr. San Jose, CA 94043 USA Phone: +1 650 823 2362 EMail: lidan(_at_)cisco(_dot_)com Hamilton, et. al. Expires December 30, 2001 [Page 15] Internet-Draft RUP Requirements July 2001 Full Copyright Statement Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society. Hamilton, et. al. Expires December 30, 2001 [Page 16]