RE: Need to look at tracing and debuggig



On Wed, 2 Apr 2003, Oskar Batuner wrote:

Let me summaraze relevant issues for the reference purposes:

1. Choice of the OPES application model.

The OPES architecture provides two possibilities for placing
application modules - OPES service application on the same computer
as the OPES dispatcher and callout server. First case (initially
called proxylet) comes as a natural extension of caching proxies.
Some commercially available caches have proprietary API for adding
application logics, like filtering capabilities. The proprietary
nature of such extensions prevented extensive deployment, and I
believe the whole OPES idea started as an attempt to standardize
triggers (rules) and proxylet API, environment and deployment.

Callout server comes either as a natural extension of the first
model to offload application processing and create a scalable
application structure or as a way to use a different class of
devices - fast L7 switches - as an application building platform.

What is common to both models - the central point, dispatcher, that
is assigned the role of policy enforcement point.


Actually, the two models are equivalent except for communication costs
between OCP agents. I cannot think of any other difference between
"same computer" and "other computer" placements, not to mention that
"same computer" is very difficult to define (same CPU? same processor
block? same server farm? same owner? etc.)

Thus, I would suggest that we keep varying communication costs in mind
but make no other assumptions or distinctions.

OPES facilities should not prefer one model over another, and this
may be achieved by keeping OPES processor as a main representation
center. It should be responsible for complying with tracing and
other OPES requirements. It does not mean that it always has to
keep persistent information, but in this case callout protocol
should support directives for tracing control. Callout protocol
may also support negotiation about insertion tracing information
into the message. OPES processor should be able either to request
necessary information from callout server or to issue directives
for information insertion and verify that directive is accepted.

Why I'm going into this lengthy discussion is that I got
an impression that there is s shift to the second model that is
causing some misunderstanding. Maybe I'm wrong.


I do not see a clear connection between "OPES facilities should not
prefer one model over another" and the rest of the paragraphs. I do
agree with the opening statement. However, I think it is premature to
make conclusions regarding OCP involvement.

I would suggest that we agree on what tracing information must be
supported first, and only after that talk about how to support it (in
OCP or elsewhere).

2. Tracing information granularity and persistence levels.
The information may be:

- message-related, e.g. "virus checking done - passed", "content
filtering applied", "translated from quibbish to danqush". Such
information should be supplied with each message and indicate that
specific action was taken. All data that describes specific actions
performed for the message should be provided with that message, as
there is no other way to find message level details later. OPES
application (including OPES processor and all application modules
and callout servers involved) is not supposed to keep volatile
information after request processing is done.


Agreed.

- session related. The session knowledge may be not directly
supported by the protocol, as the case is for HTTP. In this
situation OPES processor is responsible for keeping the
session context. Session related information may be provided
once per session, some details may be replaced by id or a
reference for subsequient information retrieval.

Session level data must be preserved for the duration of
the session. OPES processor is responsible for inserting
notifications if session-level information changes.

Examples of session-related information is "virus checker
abcd build 123 enabled", "OPES server id=xyz present".


I am not convinced we have to support these kind of tracing. The end
does not usually care whether "virus checker abcd build 123 is
enabled"; it cares only wether that virus checker has seen or modified
the application message, which is already covered by "message-related"
bullet above. Same for "OPES server id=xyz present".

What is a session? What are session boundaries? How do those
boundaries correspond to message/connection boundaries? And, finally,
why should we care about anything that does not affect our application
message?

- log information id. This may be used e.g. for accounting
and non-repudiation purposes. Detailed information referenced
by this id may be not available online but can be retrieved
later by some off-line procedure.


This belongs to the "message-related" category, IMO. For example,

        OPES-Actions: "virus checker FooBar applied
                (result=clear, logid=34341, version=123)"

- server related persistent information, e.g. "OPES center of
authority <URI>", "privacy policy <URI>". It may be also
presented once per session and it does not change between
sessions.


This has to be per-message unless you somehow can define sessions so
that the end-user can distinguish them. For example, two pipelined
HTTP request on the same TCP connection (from end-user point of view)
may pass through very different OPES intermediaries and reach
different content providers. How are you going to maintain sessions if
not on a per-message basis (which makes session concept unnecessary)?
Please give an example of a session in this context.

- end-point related data: what profile is activated (profile ID),
where to get profile details, where to set preferences. I'm not sure
how far we should go in this direction.

OK.

I see other work going on in this area (e.g.
[draft-barbir-opes-spcs-03.txt]). I thing OPES should provide a
framework for such development by defining flexible and extensible
tracing and informational facilities.


I agree, but I hope we can limit ourselves to per-message facilities
only (no "sessions").

3. Some terminology.

Can we develop a few example scenarios that illustrate the various
concepts of "information points", "reference points", "identifier",


- REFERENCE POINT - a reference that may be used out-of-band to
  perform a specific function.

  An example may be URI for the privacy policy, center of authority
  URI, server address, etc. Usually no protocol is provided to access
  the reference point.


If reference point is a URI (and it probably should be), then the
schema part of the URI (e.g., "http") usually determines ways to
access the information.

- INFORMATION POINT - implies presence of the protocol to access
  detailed information at this point. Example may be URI to get
  a certificate for virus checker or content filter, examine
  and set profile setting and active preferences.


I see no difference with the "REFERENCE POINT". The protocol
distinction is too vague. Can you give a reference point example that
lacks protocol? Do we need to distinguish the two points?

- IDENTIFIER - provides a unique binding to detailed persistent
  information. For example "transformation-applied : fe123" gives
  a participant ability to enquire (and maybe cache) details of
  the transformation fe123.


Identifier can be an information point as well, right? I guess I am
missing the motivation behind this terminology and these distinctions.

  Use of such (opaque) identifiers
  does not require prior knowledge and does not create a burden
  of storing additional information - this is just a tag for
  persistent information (not message-specific).


The same is true for REFERENCE and INFORMATION POINTS, right?

Why cannot an IDENTIFIER be message specific? For example,

  transformation-applied: http://service.org/explain?msgkind=foobar

4. Using discretion of what points should be exposed.

If we don't identify the exact server, how would a service provider
trace a problem I report to him? How would he know which server to
check, if I tell him that something went wrong and check him to ask
this? With email, for example, I know exactly which mail servers have
been o the path, thus being able to trace down to the exact server.


It is the choice of the service provider - what servers should be exposed.


I agree. Same for HTTP's Via headers. I would also add "what servers
should be exposed, if any".

For example currently if pictures coming from some site are distorted
or data is corrupted it is extremely difficult and often even impossible
to tell what front-end or back-end servers are malfunctioning, especially
in the presence of dynamically addressed CDN and multi-tier backend
application. Usually notification containing the main URL and request
parameters should be sufficient.

Mail server is also a good example: you may see only representative
of a server farm, some processing, like virus checking or spam
filtering may be performed by invisible back-end servers. Still servers
that are directly identified in the headers give resonable information
for problem analysis.


Exactly. OPES should not be responsible for solving general
finger-pointing problems on the Internet.

I'd recommend to minimise number of points exposed - in order to
hide application complexity and dynamic reconfiguration but provide
a separate logical places for information requiests and references.


I would suggest that we leave this choice to the administrator. We
should provide means to expose every OPES point. The administrator
should be able to configure their intermediaries to expose just what
they want/need.

In most cases OPES processor should hide underlying application
structure and care the burden of relayng some requests (both in-line
and out-of-band) to callout processors. This does not require
storage of additional data - at each moment OPES procesor knows all
underlying configuratiuon details and can determine what callout
processor should answer the request.


If we only trace OPES processors and service IDs, then OPES processor
should be responsible for adding tracing (it can use a trace-adding
callout service but that should not be required). A good example is a
text translation callout service. Such a service can be implemented in
an application protocol-agnostic manner and, hence, will not be able
to trace itself. OPES processor will always be able to add tracing
information though.

Alex.