Let me summaraze relevant issues for the reference purposes:
1. Choice of the OPES application model.
The OPES architecture provides two possibilities for placing
application modules - OPES service application on the same computer
as the OPES dispatcher and callout server. First case (initially
called proxylet) comes as a natural extension of caching proxies.
Some commercially available caches have proprietary API for adding
application logics, like filtering capabilities. The proprietary
nature of such extensions prevented extensive deployment, and I
believe the whole OPES idea started as an attempt to standardize
triggers (rules) and proxylet API, environment and deployment.
Callout server comes either as a natural extension of the first
model to offload application processing and create a scalable
application structure or as a way to use a different class of
devices - fast L7 switches - as an application building platform.
What is common to both models - the central point, dispatcher, that
is assigned the role of policy enforcement point.
OPES facilities should not prefer one model over another, and this
may be achieved by keeping OPES processor as a main representation
center. It should be responsible for complying with tracing and
other OPES requirements. It does not mean that it always has to
keep persistent information, but in this case callout protocol
should support directives for tracing control. Callout protocol
may also support negotiation about insertion tracing information
into the message. OPES processor should be able either to request
necessary information from callout server or to issue directives
for information insertion and verify that directive is accepted.
Why I'm going into this lengthy discussion is that I got
an impression that there is s shift to the second model that is
causing some misunderstanding. Maybe I'm wrong.
2. Tracing information granularity and persistence levels.
The information may be:
- message-related, e.g. "virus checking done - passed", "content
filtering applied", "translated from quibbish to danqush". Such
information should be supplied with each message and indicate
that specific action was taken. All data that describes specific
actions performed for the message should be provided with that
message, as there is no other way to find message level details
later. OPES application (including OPES processor and all
application modules and callout servers involved) is not
supposed to keep volatile information after request
processing is done.
- session related. The session knowledge may be not directly
supported by the protocol, as the case is for HTTP. In this
situation OPES processor is responsible for keeping the
session context. Session related information may be provided
once per session, some details may be replaced by id or a
reference for subsequient information retrieval.
Session level data must be preserved for the duration of
the session. OPES processor is responsible for inserting
notifications if session-level information changes.
Examples of session-related information is "virus checker
abcd build 123 enabled", "OPES server id=xyz present".
- log information id. This may be used e.g. for accounting
and non-repudiation purposes. Detailed information referenced
by this id may be not available online but can be retrieved
later by some off-line procedure.
- server related persistent information, e.g. "OPES center of
authority <URI>", "privacy policy <URI>". It may be also
presented once per session and it does not change between
sessions.
- end-point related data: what profile is activated (profile ID),
where to get profile details, where to set preferences. I'm not
sure how far we should go in this direction.
I see other work going on in this area
(e.g. [draft-barbir-opes-spcs-03.txt]). I thing
OPES should provide a framework for such development
by defining flexible and extensible
tracing and informational facilities.
3. Some terminology.
Can we develop a few example scenarios that illustrate the various
concepts of "information points", "reference points", "identifier",
- REFERENCE POINT - a reference that may be used out-of-band to
perform a specific function.
An example may be URI for the privacy policy, center of authority
URI, server address, etc. Usually no protocol is provided to access
the reference point.
- INFORMATION POINT - implies presence of the protocol to access
detailed information at this point. Example may be URI to get
a certificate for virus checker or content filter, examine
and set profile setting and active preferences.
- IDENTIFIER - provides a unique binding to detailed persistent
information. For example "transformation-applied : fe123" gives
a participant ability to enquire (and maybe cache) details of
the transformation fe123. Use of such (opaque) identifiers
does not require prior knowledge and does not create a burden
of storing additional information - this is just a tag for
persistent information (not message-specific).
4. Using discretion of what points should be exposed.
If we don't identify the exact server, how would a service provider
trace a problem I report to him? How would he know which server to
check, if I tell him that something went wrong and check him to ask
this? With email, for example, I know exactly which mail servers have
been o the path, thus being able to trace down to the exact server.
It is the choice of the service provider - what servers should be exposed.
For example currently if pictures coming from some site are distorted
or data is corrupted it is extremely difficult and often even impossible
to tell what front-end or back-end servers are malfunctioning, especially
in the presence of dynamically addressed CDN and multi-tier backend
application. Usually notification containing the main URL and request
parameters should be sufficient.
Mail server is also a good example: you may see only representative
of a server farm, some processing, like virus checking or spam
filtering may be performed by invisible back-end servers. Still servers
that are directly identified in the headers give resonable information
for problem analysis.
I'd recommend to minimise number of points exposed - in order to hide
application complexity and dynamic reconfiguration but provide a separate
logical places for information requiests and references. In most cases
OPES processor should hide underlying application structure and care the
burden of relayng some requests (both in-line and out-of-band) to callout
processors. This does not require storage of additional
data - at each moment OPES procesor knows all underlying configuratiuon
details and can determine what callout processor should answer the
request.
5. Additional protocol and schema definitions.
Do we have to decide on a specific protocol to be used for this
purpose, or can we leave this open and just indicated the protocol to
be used (e.g. withing the embedded URI).
As we are building the OPES framework from top to bottom we understandably
delay details introduction until we are at the appropriate level of
specification. But at some point this specifics has to be defined. If we
define all HTTP extensions necessary to implement HTTP-based OPES system
but for the information point only URI is defined, then interoperation of
different implementations may become a problem.
Oskar
-----Original Message-----
From: owner-ietf-openproxy(_at_)mail(_dot_)imc(_dot_)org
[mailto:owner-ietf-openproxy(_at_)mail(_dot_)imc(_dot_)org]On Behalf Of
Markus Hofmann
Sent: Tuesday, April 01, 2003 1:31 PM
To: OPES Group
Subject: Re: Need to look at tracing and debuggig
Oskar,
great input, thanks. Just some quick, minor comments:
1. Tracing information has to be provided in-band, I see no
other way to satisfy current architecture requirements. The
OPES architecture states that:
I agree with that. The question is, though, whether the callout
protocol itself will also carry some tracing information, or whether
the callout server will embedd possible tracing information directly
into the application message.
2. We have to decide on the in-band - out-of-band balance for tracing
facilities. Two extreme approaches are:
- in-band data provides only a reference to the point to the
facility where
the tracing information may be obtained;
- include all information in-band.
Advantage of the first approach: a) Tracing information may be provided
in application protocol independent manner. b) Level of details is
determined by direct request, lengthy descriptions may be provided
without an impact on the application protocol efficiency.
Disadvantages: a complex identification mechanism is needed to retrieve
application message specific information (like "virus checking
applied"), and
getting such simple notification will involve overhead of creating or
keeping an additional connection.
Nice comments. I was tempted to lean towards the first approach with a
refrence to a point for retrieval, but.... Where would such point of
tracing information retrieval be? Would you assume that, for example,
I can retrieve detailed tracing information about a provided OPES
service directly from the callout server the service has been
executed? Such a callout server might very well sit behind a NAT and
might have a non-routable IP adress... If it's not directly on the
callout server, how would the callout server provide the required
tracing information to the other entity?
The second approach provides a kind of opposite advantages and
disadvantages:
simple per-message information binding but application protocol may be
cluttered
with excess information that is not relevant to the current
session (may be
due to the lack of interest of the participants).
An additional advantage of in-band approach is end-to-end
coverage: tracing
information and related directives are available to all OPES flow
participants,
no topology knowledge is required.
Yup, good point, have to agree.
A reasonable combination may include message-related information
in band and
a reference to the session-related information. To adjust in-band
information
level to the needs of session participants some trace control directives
should
be defined (as an application protocol extensions).
Could you provide a specific example for session-related information
(as opposed to message-related)?
The purpose of identification (exposure) should be defined by the
intended use. Only the information points (where some participant
may call for additional information) and reference points (those
that should
be identified in related request, e.g. to the center of
authority) should be
exposed. If there are points that may accept directives (e.g. privacy
directives) - they should be exposed.
>
Also session participants should be notified about the center of
authority for the OPES server.
> [...]
Can we develop a few example scenarios that illustrate the various
concepts of "information points", "reference points", "identifier",
etc., and how they play together?
Any such explicitly identified point may be on the path or out of
the path,
this should not be a factor. Points on the path are exposed by IP,
as according to IAB requirements connections are terminated at
these points.
The IAB considerations state that "the OPES intermediary must be
explicitly addressed at the IP layer by the end user", which
translates that (only) the first "hop", i.e. the first OPES
intermediary on the path, has to be addreessable explicitely. Others
might very well sit behind a NAT or so.
Information about services should be provided in-band and should uniquely
identify the service provider and service type, but not the service point
(OPES
server). This makes service information location independent and
facilitates
system reconfiguration, including failover and recovery: agreements and
parameters
may be transferred to another OPES server (within the same trust domain)
without
renegotiation with end-point.
If we don't identify the exact server, how would a service provider
trace a problem I report to him? How would he know whichserver to
check, if I tell him that something went wrong and check him to ask
this? With email, for example, I know exactly which mail servers have
been o nthe path, thus being able to trace down to the exact server.
Direct point exposure (let's say by inclusion of URL into the tracing
information)
raises a question about out-of-band protocol used to access this point.
SOAP looks like a good candidate for that.
Do we have to decide on a specific protocol to be used for this
purpose, or can we leave this open and just indicated the protocol to
be used (e.g. withing the embedded URI).
-Markus