RE: Need to look at tracing and debuggig



More thoughts on tracing and debugging.

1. I could not find any reference to "debugging" in architecture
   and OCP requirements drafts. Shouldn't we talk just about
   tracing/reporting? I am not sure what exactly others mean by
   debugging, but since it is not required, we do not have to
   worry about it. Tracing helps with debugging, of course,
   but let's concentrate on immediate goals. Please correct me
   if I missed debugging requirements.

2. It is very important to keep in mind that tracing is limited
   to the trust domain. Anybody outside of that domain may or may not
   see any traces, depending on domain policies/configuration. In
   other words, we are not talking about mandatory end-to-end tracing
   facility here. For example, if an OPES system is on the content
   provider "side", end-users are not guaranteed any traces. If an
   OPES system is working inside end-user domain, the origin server is
   not guaranteed any traces related to user requests.

3. There are two distinct purposes/uses of traces. First, is to
   enable the "end (content producer or consumer) to detect OPES
   processor presence within end's trust domain. Such "end" should be
   able to see a trace entry, but does not need to be able to
   interpret it beyond identification of the trust domain(s).

   Second, is the domain administrator. The administrator should be
   able to take a trace entry (possibly supplied by an "end"  as an
   opaque string) and interpret it. The administrator must be able to
   identify OPES processor(s) involved and may be able to identify
   applied adaptation services along with other message-specific
   information. That information should help to explain what OPES
   agent(s) were involved and what they did, but it is impractical to
   provide all the required information in all cases. A trace record
   is a hint, not an exhaustive audit.

   Moreover, since trust domains and their administration vary a lot,
   I would argue that we must give implementors a lot of freedom in
   what to put in trace records and how to format them. Trace records
   should be easy to extend beyond basic OPES requirements. Trace
   management algorithms should treat trace records as opaque data to
   the extent possible.

   Markus asked for tracing use cases. If we start collecting those, I
   would suggest to clearly document "end" and "admin" role in each
   use case.

4. (This is a consequence of #2 and #3 above) We should not expect
   entities in one trust domain to be able to get any OPES-related
   feedback from entities in other trust domains. For example, if I am
   an end-user, and I think that the page I am getting is corrupted by
   a callout service, I should not expect to be able to identify
   that service, contact its owner, or debug it _unless_ the
   service is within my trust domain. This is no different from
   the current situation where it is impossible, in general, to know
   the contact person for an application on an origin server that
   generates broken HTML; and even if the person is known, one should
   not expect that person to respond to end-user queries (in general).

4. We know that traces must be in-band. This [very reasonable]
   requirement limits both the number of application protocols that
   OPES can adapt and the amount of details a trace record can carry.

   The former limitation must be clearly documented somewhere so that
   folks do not try to apply OPES to unsupported applications only to
   find out months later that they cannot trace them.

   Some of us may want to supply additional information out-of-band to
   address the second limitation. Since architecture and protocol
   requirements drafts do not require support for out-of-band tracing
   details, I suggest that the WG should not spend much time on them
   and treat them as possible extensions to the tracing facility.
   Let's concentrate on in-band tracing for now.

5. There is a question on whether OPES processor (OCP client) or
   callout server (OCP server) must be responsible for adding trace
   records to application messages. I am not 100% sure, but I would
   suggest that we try to assign this task OPES processor exclusively.
   Here are my reasons:

        a) Exclusive assignment simplifies the protocol.

        b) There are use cases where callout services adapt payload
        regardless of the application protocol in use and leave header
        adjustment to OPES processor or other services. For example,
        think of a generic text translation or image modification
        service; such services require payload encoding knowledge but
        can be application-independent if OPES processor can supply
        them with just the payload.

        c) OPES processor is always _able_ to trace its own invocation
        and service(s) execution because OPES processor must understand
        the application protocol. Assigning these tracing tasks to
        callout servers is just an optimization in cases where callout
        servers manipulate application message headers.

        d) We are not required to trace services, just processors,
        AFAIK.

        e) It makes OPES compliance checks easier when remote 3rd
        party callout servers are used.

        f) Servers or services MAY add their own OPES trace records,
        of course.

6. #5 suggestion implies that tracing is out of OCP scope! :-)

7. How tracing is added is application protocol-specific and may be
   documented in separate RFCs/drafts. We can only document what tracing
   information is required and, perhaps, some common tracing elements.


HTH,

Alex.