ietf-openproxy
[Top] [All Lists]

RE: Need to look at tracing and debuggig

2003-04-02 00:12:21

Let me summaraze relevant issues for the reference purposes:

1. Choice of the OPES application model.

The OPES architecture provides two possibilities for placing 
application modules - OPES service application on the same computer 
as the OPES dispatcher and callout server. First case (initially 
called proxylet) comes as a natural extension of caching proxies. 
Some commercially available caches have proprietary API for adding 
application logics, like filtering capabilities. The proprietary 
nature of such extensions prevented extensive deployment, and I 
believe the whole OPES idea started as an attempt to standardize 
triggers (rules) and proxylet API, environment and deployment. 

Callout server comes either as a natural extension of the first 
model to offload application processing and create a scalable 
application structure or as a way to use a different class of 
devices - fast L7 switches - as an application building platform. 

What is common to both models - the central point, dispatcher, that 
is assigned the role of policy enforcement point. 

OPES facilities should not prefer one model over another, and this 
may be achieved by keeping OPES processor as a main representation 
center. It should be responsible for complying with tracing and 
other OPES requirements. It does not mean that it always has to 
keep persistent information, but in this case callout protocol 
should support directives for tracing control. Callout protocol 
may also support negotiation about insertion tracing information 
into the message. OPES processor should be able either to request 
necessary information from callout server or to issue directives 
for information insertion and verify that directive is accepted.

Why I'm going into this lengthy discussion is that I got 
an impression that there is s shift to the second model that is 
causing some misunderstanding. Maybe I'm wrong.

2. Tracing information granularity and persistence levels. 
The information may be:

- message-related, e.g. "virus checking done - passed", "content 
filtering applied", "translated from quibbish to danqush". Such 
information should be supplied with each message and indicate 
that specific action was taken. All data that describes specific 
actions performed for the message should be provided with that 
message, as there is no other way to find message level details 
later. OPES application (including OPES processor and all 
application modules and callout servers involved) is not 
supposed to keep volatile information  after request 
processing is done. 

- session related. The session knowledge may be not directly 
supported by the protocol, as the case is for HTTP. In this 
situation OPES processor is responsible for keeping the  
session context. Session related information may be provided 
once per session, some details may be replaced by id or a 
reference for subsequient information retrieval.

Session level data must be preserved for the duration of 
the session. OPES processor is responsible for inserting 
notifications if session-level information changes. 

Examples of session-related information is "virus checker 
abcd build 123 enabled", "OPES server id=xyz present". 

- log information id. This may be used e.g. for accounting 
and non-repudiation purposes. Detailed information referenced 
by this id may be not available online but can be retrieved 
later by some off-line procedure.

- server related persistent information, e.g. "OPES center of 
authority <URI>", "privacy policy <URI>". It may be also 
presented once per session and it does not change between 
sessions.

- end-point related data: what profile is activated (profile ID), 
where to get profile details, where to set preferences. I'm not 
sure how far we should go in this direction. 

I see other work going on in this area 
(e.g. [draft-barbir-opes-spcs-03.txt]). I thing 
OPES should provide a framework for such development 
by defining flexible and extensible 
tracing and informational facilities.

3. Some terminology.

Can we develop a few example scenarios that illustrate the various 
concepts of "information points", "reference points", "identifier", 

- REFERENCE POINT - a reference that may be used out-of-band to 
  perform a specific function. 

  An example may be URI for the privacy policy, center of authority 
  URI, server address, etc. Usually no protocol is provided to access 
  the reference point.
  
- INFORMATION POINT - implies presence of the protocol to access 
  detailed information at this point. Example may be URI to get 
  a certificate for virus checker or content filter, examine 
  and set profile setting and active preferences.

- IDENTIFIER - provides a unique binding to detailed persistent 
  information. For example "transformation-applied : fe123" gives 
  a participant ability to enquire (and maybe cache) details of 
  the transformation fe123. Use of such (opaque) identifiers 
  does not require prior knowledge and does not create a burden 
  of storing additional information - this is just a tag for 
  persistent information (not message-specific).

4. Using discretion of what points should be exposed.
 
If we don't identify the exact server, how would a service provider 
trace a problem I report to him? How would he know which server to 
check, if I tell him that something went wrong and check him to ask 
this? With email, for example, I know exactly which mail servers have 
been o the path, thus being able to trace down to the exact server.

It is the choice of the service provider - what servers should be exposed. 
For example currently if pictures coming from some site are distorted 
or data is corrupted it is extremely difficult and often even impossible 
to tell what front-end or back-end servers are malfunctioning, especially 
in the presence of dynamically addressed CDN and multi-tier backend 
application. Usually notification containing the main URL and request 
parameters should be sufficient. 

Mail server is also a good example: you may see only representative 
of a server farm, some processing, like virus checking or spam 
filtering may be performed by invisible back-end servers. Still servers 
that are directly identified in the headers give resonable information 
for problem analysis. 

I'd recommend to minimise number of points exposed - in order to hide 
application complexity and dynamic reconfiguration but provide a separate 
logical places for information requiests and references. In most cases 
OPES processor should hide underlying application structure and care the 
burden of relayng some requests (both in-line and out-of-band) to callout 
processors. This does not require storage of additional 
data - at each moment OPES procesor knows all underlying configuratiuon 
details and can determine what callout processor should answer the 
request.

5. Additional protocol and schema definitions.

Do we have to decide on a specific protocol to be used for this 
purpose, or can we leave this open and just indicated the protocol to 
be used (e.g. withing the embedded URI).

As we are building the OPES framework from top to bottom we understandably 
delay details introduction until we are at the appropriate level of 
specification. But at some point this specifics has to be defined. If we 
define all HTTP extensions necessary to implement HTTP-based OPES system 
but for the information point only URI is defined, then interoperation of 
different implementations may become a problem.

Oskar

-----Original Message-----
From: owner-ietf-openproxy(_at_)mail(_dot_)imc(_dot_)org
[mailto:owner-ietf-openproxy(_at_)mail(_dot_)imc(_dot_)org]On Behalf Of 
Markus Hofmann
Sent: Tuesday, April 01, 2003 1:31 PM
To: OPES Group
Subject: Re: Need to look at tracing and debuggig



Oskar,

great input, thanks. Just some quick, minor comments:

1. Tracing information has to be provided in-band, I see no
other way to satisfy current architecture requirements. The
OPES architecture states that:

I agree with that. The question is, though, whether the callout 
protocol itself will also carry some tracing information, or whether 
the callout server will embedd possible tracing information directly 
into the application message.

2. We have to decide on the in-band - out-of-band balance for tracing
facilities. Two extreme approaches are:

- in-band data provides only a reference to the point to the 
facility where
the tracing information may be obtained;

- include all information in-band.

Advantage of the first approach: a) Tracing information may be provided
in application protocol independent manner. b) Level of details is
determined by direct request, lengthy descriptions may be provided
without an impact on the application protocol efficiency.
Disadvantages: a complex identification mechanism is needed to retrieve
application message specific information (like "virus checking 
applied"), and
getting such simple notification will involve overhead of creating or
keeping an additional connection.

Nice comments. I was tempted to lean towards the first approach with a 
refrence to a point for retrieval, but.... Where would such point of 
tracing information retrieval be? Would you assume that, for example, 
I can retrieve detailed tracing information about a provided OPES 
service directly from the callout server the service has been 
executed? Such a callout server might very well sit behind a NAT and 
might have a non-routable IP adress... If it's not directly on the 
callout server, how would the callout server provide the required 
tracing information to the other entity?

The second approach provides a kind of opposite advantages and 
disadvantages:
simple per-message information binding but application protocol may be
cluttered
with excess information that is not relevant to the current 
session (may be
due to the lack of interest of the participants).

An additional advantage of in-band approach is end-to-end 
coverage: tracing
information and related directives are available to all OPES flow
participants,
no topology knowledge is required.

Yup, good point, have to agree.

A reasonable combination may include message-related information 
in band and
a reference to the session-related information. To adjust in-band 
information
level to the needs of session participants some trace control directives
should
be defined (as an application protocol extensions).

Could you provide a specific example for session-related information 
(as opposed to message-related)?

The purpose of identification (exposure) should be defined by the
intended use.  Only the information points (where some participant
may call for additional information) and reference points (those 
that should
be identified in related request, e.g. to the center of 
authority) should be
exposed. If there are points that may accept directives (e.g. privacy
directives) - they should be exposed.
 >
Also session participants should be notified about the center of
authority for the OPES server.
 > [...]

Can we develop a few example scenarios that illustrate the various 
concepts of "information points", "reference points", "identifier", 
etc., and how they play together?

Any such explicitly identified point may be on the path or out of 
the path,
this should not be a factor. Points on the path are exposed by IP,
as according to IAB requirements connections are terminated at 
these points.

The IAB considerations state that "the OPES intermediary must be 
explicitly addressed at the IP layer by the end user", which 
translates  that (only) the first "hop", i.e. the  first OPES 
intermediary on the path, has to be addreessable explicitely. Others 
might very well sit behind a NAT or so.


Information about services should be provided in-band and should uniquely
identify the service provider and service type, but not the service point
(OPES
server). This makes service information location independent and 
facilitates
system reconfiguration, including failover and recovery: agreements and
parameters
may be transferred to another OPES server (within the same trust domain)
without
renegotiation with end-point.

If we don't identify the exact server, how would a service provider 
trace a problem I report to him? How would he know whichserver to 
check, if I tell him that something went wrong and check him to ask 
this? With email, for example, I know exactly which mail servers have 
been o nthe path, thus being able to trace down to the exact server.

Direct point exposure (let's say by inclusion of URL into the tracing
information)
raises a question about out-of-band protocol used to access this point.
SOAP looks like a good candidate for that.

Do we have to decide on a specific protocol to be used for this 
purpose, or can we leave this open and just indicated the protocol to 
be used (e.g. withing the embedded URI).

-Markus