comments on draft-beck-opes-irml-02



* Abstract / Section 2 - IRML is introduced as being targeted at
Web Services. Although it's fairly new, WS is widely held to be the
set of technologies being standardised in the W3C (e.g., SOAP,
WSDL, etc.). Are IRML's Web Services the same ones?

SOAP is architected so that all directives to intermediaries are
contained in the message (using Header blocks). While it's possible
to control the behaviour of intermediaries using out-of-band
techniques (like rulesets), this is generally unexplored territory
for SOAP people.

I'd suggest rewriting the abstract & introduction to de-emphasise
WS; really, they're just an application that can be used with OPES,
rather than the driving factor (yes, WS is about distributed
applications, etc., but it's a term that's used in a very vague
sense, and mixing it with OPES at this level only muddies the
waters). If WS is important to OPES, it would be good to
(somewhere) reference the WS work, and to explain the relationship
between OPES rulesets and SOAP Headers (I can help come up with
some prose here if it would help).

* Section 3 - Is there any reason a public identifier was chosen
vs. a system identifer? Also, if namespaces are used, they should
always be used, whether the document is a whole ruleset or a
fragment.

* Section 3.1 - 'The conditions within a rule refer to message 
properties in the request or response message of a given content
transaction.' -> 'The conditions of a rule can refer to the
properties of messages passing through an OPES intermediary.' (so
it's not tied to request/response).

* Section 3.3.4 - why not just mandate a URI; you get all of the
properties of uniqueness out of the bag (it can be a uuid:... or a
mailto:...)

* Section 3.4.2 - 'In self-authored rule modules, the authorizing 
endpoint MUST be identical with the rule module author.'-> 'When
the content of the type attribute of the author element is 'self',
the content of the authorized-by element's name and id child
elements MUST be identical to that of the corresponding children of
the author element.'

Is the authorized-by element intended to capture the authorisation
state of the ruleset (e.g., the rule processor appends it to the
ruleset upon discovering and verifying the identity and rights of
the submitting process), or is it a placeholder for a mechanism
like XML Digital Signatures?

* Section 3.4.2 - missing caption on table (identify as authorized-by's
attributes)

* Section 3.4.3 - perhaps it would be better to use a URI instead
of an acronym? Also, is it always the case that a group of rules
will apply to exactly one protocol? I'd argue that it would be
better to associate a protocol with an individual rule.

* Section 3.5.1 - the first sentence is confusing; rules are specified
as having one or more conditions, and then it's said that they can
have zero conditions.

* Section 3.5.1 - processing points - this seems unneccesarily
restrictive; 1|2|3|4 is based on a request/response pattern, and
even then it's very possible that more processing points will be
desireable.

Instead, why not, after identifying the protocol by a URI, identify
the protocol-scoped message as a URI as well? Then, you can
identify message-specific processing points with (drum roll) a URI.

For example;

<rule protocol="http://ietf-opes.org/protocol/rfc/2616";
      message="http://ietf-opes.org/protocol/rfc/2616/Request";
      processing-point="http://ietf-opes.org/protocol/rfc/2616/Request/Client";


These are just example URIs; the real ones might use urn:ietf, or
be somewhere else (e.g., the SOAP HTTP binding defines URIs that
would be useful when talking about SOAP).

Note that these shouldn't be inferred from one URI, because then
one would have to understand the structure of the URI, which
obviates the benefits of using them (extensibility, uniqueness,
etc.).
 
This also makes it possible to refer to the application of
individual rules as processing points, so you can describe
relationships between rules.

Would there be any case where specifying a port is necessary?

* Section 3.6.1 - just curious, why 'property' instead of
'condition' (which is what it's refered to throughout the rest of
the text?)

re: context; same arguments as above re: URI. Rather than making
things like request-URI, response status, protocol version, etc.
protocol-specific system variables, why not make them
message-specific context? For that matter, I wonder if it would be
good to make everything matchable (including protocol, etc.); this
seems better than building in priviledged conditions, considering
that OPES is supposed to be a generic framework.

re: matches; regex is awfully expensive (and encourages people to
cut corners in parsing). Could some room for extensibility be left
here, so that other means of matching can be specified? I suppose
that regex could be the default, and other means of matching could
be specified by an attribute (there needs to be room for
expressions here anyway; it would be good to be able to do
"response.headers.age < 30", for example).

Match seems to have the beginnings of boolean logic embedded; I see
AND (multiple matches specified) and NOT (not-matches) and even OR
(just flattened out to multiple rules). IMHO It would be good if
this were explicit; e.g., <and> ... </and> <or> ... </or> <not> ...
</not> wrappers around matches.

* Section 3.7.4 - This looks good, but the syntax is a but
cluttered; there's the potential for a lot of rules, so we want
them to be as compact and readable as possible.

I'd suggest
   - identifying 'any' as a unique URI (e.g., 
     http://ietf-opes.org/irml/service/any) so that it can be
     collapsed into 'uri'
   - making 'uri' an attribute of 'service' and using a more descriptive
     name (perhaps 'type', moving the current semantics of type to
     something more descriptive like 'fail_target')
   - if parameters are desired, they can be unqualified child
     elements of service
   - if dynamic values are desired in parameters, call them out with
     a mechanism like XSLT's value-of

This gives you:
  <irml:service name="Foo" type="http://..."; fail="..."/>
or, if you want parameters,
  <irml:service name="Bar" type="http://..."; fail="...">
    <param1>a</param1>
    <param2><irml:value-of var="whatever"/></param2>
  <irml:service> 
which is much easier to read if there are multiple services or
parameters.

An issue regarding specification of failure is idempotency; there
may be situations where failing over to another service may cause a
request to be repeated. In HTTP, idempotency is theoretically easy
to determine (except for cases where people use GET with side
effects, but that IMHO isn't our problem). Other protocols may not
make it so easy.

I don't see anything about differentiating between local and remote
services. How does that happen (or does it in the rule language)?

-- 
Mark Nottingham
http://www.mnot.net/