More thoughts on tracing and debugging.
1. I could not find any reference to "debugging" in architecture
and OCP requirements drafts. Shouldn't we talk just about
tracing/reporting? I am not sure what exactly others mean by
debugging, but since it is not required, we do not have to
worry about it. Tracing helps with debugging, of course,
but let's concentrate on immediate goals. Please correct me
if I missed debugging requirements.
2. It is very important to keep in mind that tracing is limited
to the trust domain. Anybody outside of that domain may or may not
see any traces, depending on domain policies/configuration. In
other words, we are not talking about mandatory end-to-end tracing
facility here. For example, if an OPES system is on the content
provider "side", end-users are not guaranteed any traces. If an
OPES system is working inside end-user domain, the origin server is
not guaranteed any traces related to user requests.
3. There are two distinct purposes/uses of traces. First, is to
enable the "end (content producer or consumer) to detect OPES
processor presence within end's trust domain. Such "end" should be
able to see a trace entry, but does not need to be able to
interpret it beyond identification of the trust domain(s).
Second, is the domain administrator. The administrator should be
able to take a trace entry (possibly supplied by an "end" as an
opaque string) and interpret it. The administrator must be able to
identify OPES processor(s) involved and may be able to identify
applied adaptation services along with other message-specific
information. That information should help to explain what OPES
agent(s) were involved and what they did, but it is impractical to
provide all the required information in all cases. A trace record
is a hint, not an exhaustive audit.
Moreover, since trust domains and their administration vary a lot,
I would argue that we must give implementors a lot of freedom in
what to put in trace records and how to format them. Trace records
should be easy to extend beyond basic OPES requirements. Trace
management algorithms should treat trace records as opaque data to
the extent possible.
Markus asked for tracing use cases. If we start collecting those, I
would suggest to clearly document "end" and "admin" role in each
use case.
4. (This is a consequence of #2 and #3 above) We should not expect
entities in one trust domain to be able to get any OPES-related
feedback from entities in other trust domains. For example, if I am
an end-user, and I think that the page I am getting is corrupted by
a callout service, I should not expect to be able to identify
that service, contact its owner, or debug it _unless_ the
service is within my trust domain. This is no different from
the current situation where it is impossible, in general, to know
the contact person for an application on an origin server that
generates broken HTML; and even if the person is known, one should
not expect that person to respond to end-user queries (in general).
4. We know that traces must be in-band. This [very reasonable]
requirement limits both the number of application protocols that
OPES can adapt and the amount of details a trace record can carry.
The former limitation must be clearly documented somewhere so that
folks do not try to apply OPES to unsupported applications only to
find out months later that they cannot trace them.
Some of us may want to supply additional information out-of-band to
address the second limitation. Since architecture and protocol
requirements drafts do not require support for out-of-band tracing
details, I suggest that the WG should not spend much time on them
and treat them as possible extensions to the tracing facility.
Let's concentrate on in-band tracing for now.
5. There is a question on whether OPES processor (OCP client) or
callout server (OCP server) must be responsible for adding trace
records to application messages. I am not 100% sure, but I would
suggest that we try to assign this task OPES processor exclusively.
Here are my reasons:
a) Exclusive assignment simplifies the protocol.
b) There are use cases where callout services adapt payload
regardless of the application protocol in use and leave header
adjustment to OPES processor or other services. For example,
think of a generic text translation or image modification
service; such services require payload encoding knowledge but
can be application-independent if OPES processor can supply
them with just the payload.
c) OPES processor is always _able_ to trace its own invocation
and service(s) execution because OPES processor must understand
the application protocol. Assigning these tracing tasks to
callout servers is just an optimization in cases where callout
servers manipulate application message headers.
d) We are not required to trace services, just processors,
AFAIK.
e) It makes OPES compliance checks easier when remote 3rd
party callout servers are used.
f) Servers or services MAY add their own OPES trace records,
of course.
6. #5 suggestion implies that tracing is out of OCP scope! :-)
7. How tracing is added is application protocol-specific and may be
documented in separate RFCs/drafts. We can only document what tracing
information is required and, perhaps, some common tracing elements.
HTH,
Alex.