RE: Need to look at tracing and debuggig



On Thu, 3 Apr 2003, Oskar Batuner wrote:

Thus, I would suggest that we keep varying communication costs in mind
but make no other assumptions or distinctions.


This is a kind of extreme approach. It implies that no matter where the
application module resides all interaction with dispatcher goes through
OCP. This is possible, but initially we had some attempts to define a
proxylet environment and API without regard to protocol. In this approach
"same computer" means something that supports a fast and reliable (usually
proprietary) method to get information from the OPES dispatcher to the
application module.


Oh, that's perfectly fine. I was only talking about the tracing impact
on OCP. Other protocols or interfaces may have different models, of
course.

- session related. The session knowledge may be not directly
supported by the protocol, as the case is for HTTP. In this
situation OPES processor is responsible for keeping the
session context. Session related information may be provided
once per session, some details may be replaced by id or a
reference for subsequent information retrieval.

Session level data must be preserved for the duration of
the session. OPES processor is responsible for inserting
notifications if session-level information changes.

Examples of session-related information is "virus checker
abcd build 123 enabled", "OPES server id=xyz present".


I am not convinced we have to support these kind of tracing. The end
does not usually care whether "virus checker abcd build 123 is
enabled"; it cares only wether that virus checker has seen or modified
the application message, which is already covered by "message-related"
bullet above. Same for "OPES server id=xyz present".


Taking virus checker as an example I can see two situations when
detailed information is important. The end user might wish to verify
virus checker signature and certificate. Very reasonable thing to do
if one is going to rely on the "check passed" notification. Another
possibility is a second OPES processor with a virus checker present
in the OPES flow. When signed as "passed" message reaches this
second processor it may be interested not only in the first virus
checker credentials but also in type and version information - in
order to adopt it's actions.


I agree! However, I think this information should be transmitted on a
per-message basis because the application protocol we care about most
(HTTP) does not have sessions. Not sure about SMTP.

What is a session? What are session boundaries? How do those
boundaries correspond to message/connection boundaries? And, finally,
why should we care about anything that does not affect our application
message?


I do not have a "standard" session definition at hand, so for our
purposes I'd define session as an application dependent persistent
context that propagates certain parameters to all messages belonging
to that session. In this sense session notion does affect our
application messages: if there was a specific OPES application
participating in the session its credentials and functional
abilities extend to all messages within that session. Reintroducing
and re-verifying complete list of all participating OPES servers'
credentials, settings and functional capabilities with each message
may have a serious performance impact.


I agree. However, HTTP does not have sessions. Nothing in HTTP
"propagates certain parameters to all messages belonging to X" because
HTTP messages do not belong to any end-to-end X. The closest you can
come is an HTTP persistent connection, but since it is hop-by-hop, it
cannot be used as a session.

- server related persistent information, e.g. "OPES center of
authority <URI>", "privacy policy <URI>". It may be also
presented once per session and it does not change between
sessions.


This has to be per-message unless you somehow can define sessions so
that the end-user can distinguish them. For example, two pipelined
HTTP request on the same TCP connection (from end-user point of view)
may pass through very different OPES intermediaries and reach
different content providers. How are you going to maintain sessions if
not on a per-message basis (which makes session concept unnecessary)?
Please give an example of a session in this context.


Let's say a series of requests from the same user to the same site.
It is application dependent, so I'd live determination of the session
boundaries to the application. This creates a possibility that different
participants may define session differently, but a) they may add a session
id or use the existing one (application defined), b) they may do certain
thing that are safe even if session context is lost, e.g. put unique
identifiers instead of descriptions. If end user lost related description
he/she may just send a kind of whois request for that identifier.

I do not understand your statement about two requests in the same
connection passing through different OPES server to different content
providers. My understanding was that the requirement of explicit
IP exposure and per-hop connection termination prevents such situations.
Please explain.


Yes, that statement is in the core of our disagreement. Here is an
example:

        - Client issues two requests A and B. Both requests
          are pipelined on the same HTTP/TCP connection to
          client-side proxy P0. The requests may be to the
          same origin server, but do not have to be.

        - Proxy P0 reads both requests and sends them off
          using two HTTP/TCP connections: request A
          is forwarded to proxy PA, and request B
          is forwarded to proxy PB. There are many reasons
          to "split" two requests: caching, processing
          rules, peering relationships, etc.

        - Both proxy PA and PB have OPES processors.
          Processors belong to different OPES systems,
          with different privacy policies, etc. Each
          forwards the request its origin server.

It is not possible to establish a real "session" between a client and
proxy PA because the client would not be able to distinguish that
session from the "session" established by proxy PB. Moreover, the same
request A sent 5 seconds later may go to PB and not PA (perhaps P0
load balances or whatever).

The only option to maintain session information is to send some kind
of session ID with every response so that the client can distinguish
sessions on a per-message basis and maintain its own session state.
This becomes not a session but rather a "temporary alias"
optimization:

first message:
    OPES-Agent-Details: a very long header ... (id=1231)

next messages:
    OPES-Agent-Details: see info with id 1231, if you still have it

Is that something you are after?

Knowledge of information persistence permits replacement of detailed
data ( which is required for volatile information) by references.
Sure server URI is an object better defined than a session. We may
further clarify what kind of persistency should be taken into
consideration, but considering all information as volatile
(per-message) looks like too strong simplification.


"replacement of detailed data by references" smells like a
OPES-Agent-Details example above. I would not call it a session-based
approach. This is more like caching or temporary aliasing. If that is
something you are after, perhaps we can rename it to avoid "session"
confusion?

3. Some terminology.

Can we develop a few example scenarios that illustrate the various
concepts of "information points", "reference points", "identifier",


- REFERENCE POINT - a reference that may be used out-of-band to
  perform a specific function.

  An example may be URI for the privacy policy, center of authority
  URI, server address, etc. Usually no protocol is provided to access
  the reference point.


If reference point is a URI (and it probably should be), then the
schema part of the URI (e.g., "http") usually determines ways to
access the information.

- INFORMATION POINT - implies presence of the protocol to access
  detailed information at this point. Example may be URI to get
  a certificate for virus checker or content filter, examine
  and set profile setting and active preferences.


I see no difference with the "REFERENCE POINT". The protocol
distinction is too vague. Can you give a reference point example that
lacks protocol? Do we need to distinguish the two points?


abuse(_at_)ad-insertion(_dot_)com looks for me as a reference point. The main
difference is presence of a programmatic facility for information
retrieval. Such facility permits to remove from the protocol detailed
information even if this information may be needed by the
participating applications.


To me, abuse(_at_)ad-insertion(_dot_)com implies SMTP (e-mail) protocol to 
access
the information so it becomes an INFORMATION POINT. I can send an
e-mail and expect an automated or personal response with more info.

I guess we can ignore this confusion until the difference between
REFERENCE and INFORMATION POINTs becomes clear or important.

Alex.