EPSFW processing models

I've been meaning since the September workshop to start a discussion
on the high-level processing model proposed in the EPSFW draft, as well
as some alternatives.  This note is intended to get the ball rolling,
but it is certainly far from complete.  Below I briefly examine
three processing models: the Rule Model, the Script Model, and the
Streams Model.  The Rule Model, which is the processing model
proposed in the draft, is given the most attention.

Regards,
Steve

-----------------------------------------------------------------------------


Rule Model
----------

The Rule Model is the processing model defined in the draft.  To save
some typing in the following discussion, I use these acronyms for terms
defined in that document:

PSEE = proxy service execution environment
PL   = proxylet library (language bindings for the PSEE API)
RM   = rule module

A RM defines a set of patterns that, when matched, cause actions
to be fired.  The actions can be calls to proxylets, the PL, or
remote callouts, all of which can modify the PSEE state in general,
and the relevant message in particular.

Below are various thoughts/observations/questions regarding this
processing model and/or its definition in the draft.

1) Clearly a RM is essentially an application.  What is less clear
in the draft is whether a proxylet is bound to a particular
RM (that is, it is part of a specific application) or if a proxylet
is instead accessible from any RM.  There are both useability and
security implications here.

From a useability standpoint, it would be convenient if proxylets

were independent of RMs.  This would allow proxylets to be called
from multiple RMs and to implement utility functions that can be
called from other proxylets.  Viewing proxylets as a shared resource,
rather than as part of an application, allows them to generically
extend the PL.

From a security standpoint, binding proxylets to RMs results in

a simpler security model.  The draft states that "there must be a
way to associate a [RM] with a single owner", so there is already
a defined security context for RMs.  If an RM and a set of
proxylets are all part of the same application, then a proxylet
always executes in the same security context as the RM to which it
is bound.  Otherwise, if proxylets are independent of RMs, then a
proxylet executes in the security context of the calling RM.  In this
case there also needs to be a security model to determine which RMs
can call which proxylets (directly or via other proxylets).

Another security/resource/management issue with shared proxylets is
that if the PSEE decides that a proxylet is hung, malicious, or
otherwise bad, then ejecting/restarting the proxylet will effect
multiple applications (RMs).


2) The draft defines a coarse-grained message flow lifecycle that
contains four points at which rules can be triggered.  I would argue
that a more fine-grained lifecycle model will be needed to
support multiple applications concurrently and to provide a basis
for resolving rule conflicts.

While I'll leave the details of this for a later discussion, I will
note that at a high-level this lifecycle model will have to be
reflected in the RM syntax.  Either patterns or actions will need to
specify the lifecycle phase in which they operate.  For several
reasons, associating a pattern with a lifecycle phase makes the most
sense, not the least of which is that it seems a more natural way
to code RMs.

3)  The processing model should support the ability for
applications (RMs) to introduce requests into the system to be
processed as if generated by a user agent, but not require that
the message pass through the full lifecycle.  This supports the
ability to implement such operations as prefetch at the application
level.

4) The processing model must make clear which
operations are the domain of the base caching proxy and which are the
domain of the PSEE.  Section 5.2.2 of the draft indirectly defines
some of this, but a more definitive statement is required.  For example,
it is not clear to me if actions called from a RM can satisfy a request;
e.g., via RMI, database query, conjuring, etc.  I think that perhaps
section 5.2.2.2 bullet 4 addresses this, but I'm not sure.

5) There is a general need to better define the processing model and
the embodied data flow.  For example, the draft isn't clear on the
relationship between the message parsers and the rule processor.



Script Model
------------

The Script Model is an extension of the Rule Model.  In this case,
a (possibly special purpose) scripting language replaces the simple
rule language, and each application is an executable script rather
than the simple awk-like rules in a RM.  Other high-level concepts
remain the same.

I have not yet given this processing model sufficient thought, but
here are some random observations.

1) The requirement for a full interpreter will impact performance.
2) The scripting language is likely to increase the difficulty of
integrating multiple applications.
3) It is not immediately obvious how difficult it will be to map
this on to a well-defined message flow lifecycle, which I believe
is important as discussed above.
4) This may actually be isomorphic to the Rule Model with script
code embedded in proxylets.


Streams Model
-------------

The Streams Model is conceptually similar to the UNIX SVR4 STREAMS
facility.  To facilitate the description, I'll make the following
definitions:

transform module (TM) - an executable module that sits in the message
data flow and performs transformations of messages.

utility module (UM) - an executable module that provides utility
functions callable from TMs and UMs.

This processing model works as follows.  Messages representing user
agent requests only or user agent requests and content server replies
flow through TMs.  The TMs can modify the request data, generate reply
data, or modify reply data, as appropriate.  Each TM operates in
exactly one phase of the message flow lifecycle.  TMs can call functions
in UMs.  TMs can register interest in only seeing messages that
meet certain criteria, in a manner similar to, but perhaps less
fine-grained than, the pattern matching in the Rule Model.

We'll define an application to be a collection of TMs and optionally UMs
that implement a logical service.  We'll define an extension library to
be a collection of UMs that provide utility functions to applications.

In the Streams Model, an application executes in a given security
context,
modifying messages and calling utility library functions.  The security
issues here are completely analogous to those discussed for the Rule
Model
with shared proxylets.


Here are a few random observations on this processing model.

1) All interesting system behavior can be defined in terms of TMs and
UMs.
There is no meaningful separation of base caching proxy and PSEE,
in contrast to the the Rule and Script Models.

2) The use of a fine-grained message lifecycle is very natural in
this model.  It may be less so in the Rule and Script Models.

3) Introducing requests into the system to implement prefetching
applications, etc., as discussed earlier, can be done in a very
natural fashion with this model.  It may be less so in the Rule and
Script Models.