Re: Tracing Draft version-0004082003



On Tue, 8 Apr 2003, Abbie Barbir wrote:

Attached is the -00 version of the tracing draft.
Please keep in mind it is work in progress.
Feedback is required.


Abbie,

        This is a great start, especially given a very short time you
had to write this first version of the draft! Specific comments are
inlined. I did not review the overall structure of the draft because
I think it is premature to do that (need more "meat" and it is not
yet clear whether some of the current sections are going to stay).


  N.B. Pease format using 72 character line length if possible -- it
       makes it easier (for some of us) to quote and comment. Thank you.

                      OPES tracing facility


How about just "OPES Tracing"? Does "Facility" mean something specific?
Will there be other (non "facility") OPES tracing drafts?

                      draft-ietf-opes-tracing

1. Introduction

The Open Pluggable Edge Services (OPES) architecture enables cooperative
application services (OPES services) between a data provider, a data consumer,
and zero or more OPES processors.  The application services under
consideration analyze and possibly transform application-level messages
exchanged between the data provider and the data consumer.

The execution of such services is governed by a set of rules installed on the
OPES processor.  The rules enforcement can trigger the execution of service
applications local to the OPES processor.

Alternatively, the OPES processor can distribute the responsibility of service
execution by communicating and collaborating with one or more remote callout
servers. As described in [], an OPES processor communicates with and invokes
services on a callout server by using a callout protocol.

In [], the IAB has required the OPES working group to support tracing and
notification. This document addresses these IAB requirements.


IAB (RFC 3238) does not require support of anything. It lists considerations
that the WG should _address_:

   "The purpose of this document is not to recommend specific solutions
   for OPES, or even to mandate specific functional requirements....
   Instead, these are recommendations on issues
   that any OPES solutions standardized in the IETF should be required
   to address, similar to the "Security Considerations" currently
   required in IETF documents [RFC2316].  As an example, one way to
   address security issues is to show that appropriate security
   mechanisms have been provided in the protocol, and another way to
   address security issues is to demonstrate that no security issues
   apply to this particular protocol.

There is a huge difference between "requiring support" and enumerating "issues
that OPES solutions should be required to address". Let's not shoot ourselves
in the foot. :-)

Also, RFC 3238 does not contain the word "trace" or "tracing", just
"notification".

I would suggest saying something like this:

    IAB has required OPES solutions to address end user and
    content provider notification concerns. This document
    specifies tracing mechanisms that address those concerns.

It would be nice to explain somewhere why we are not calling this
document OPES Notification [Facility].

The document examines the effect of tracing and notification
requirements on OPES architecture and callout protocol []. In
particular, the work identifies traceable entities in an OPES flow
and how this information is relayed to end points.


what information?

As per the architecture document [], there is a requirement of
relaying tracing information in-band. The document investigate this
possibility and discusses possible methods that could be used to
detect faulty OPES processors by end points on an OPES flow.


What about faulty callout servers? Do we have a term that describes
OPES system (processor + callout servers + whatever else is out there
that is OPES-related)?

The document is organized as follows: Section 2 considers ? Section
3? etc.


2. Basic Definitions

- REFERENCE POINT - a reference that may be used out-of-band to
  perform a specific function.

  An example may be URI for the privacy policy, center of authority
  URI, server address, etc. Usually no protocol is provided to
  access the reference point.

- INFORMATION POINT - implies presence of the protocol to access
  detailed information at this point. Example may be URI to get
  a certificate for virus checker or content filter, examine
  and set profile setting and active preferences.

- IDENTIFIER - provides a unique binding to detailed persistent
  information. For example "transformation-applied : fe123" gives a
  participant ability to enquire (and maybe cache) details of the
  transformation fe123. Use of such (opaque) identifiers does not
  require prior knowledge and does not create a burden of storing
  additional information - this is just a tag for persistent
  information (not message-specific).


The above classification seems like a result of protocol
over-engineering to me. Would it be possible to avoid introducing any
classifications/terms until the draft starts actually _using_ them for
a specific purpose?  This will save us a lot of time -- there is no
reason (and it is very difficult) to discuss something that is not
used (yet).

3. Requirements for Notification in an OPES Flow


This section takes a look at the IAB requirements (3.1) and (3.2) and how they
relate to notification

3.1 Notification Requirements

There are requirements on the architecture [] to assist content provider
applications in detecting and responding to data consumer applications actions
by OPES intermediaries that are deemed inappropriate by the content provider.
This is referred to as notification.

In general, notification goes in opposite direction of tracing and cannot be
attached to application messages that it notifies about.


If we compare notification with tracing like that, we should talk
about/define tracing first and only then provide a comparison.

An "opposite direction" illustration (figure) would be nice here!

This can be done


"This has to be done" ?

out-band and may require the development of a new protocol. In general, this
opposite-direction, outside-of-message scheme is difficult to support.


What does it mean "difficult to support"? Consider removing that
sentence. (it's OK in a conversation, but not in a spec) This text is
of great importance because we are, essentially, saying that the
"ideal" scheme that IAB folks envisioned is not practical. We need to
be as specific as possible here.

NOTE: When would a content provider issue such request?


What request?

How would such
mechanism be used?  Randomly, or on a statistical basis?  Or manually? Is such
a scheme of practical relevance?


In the above, there is no definition of the "mechanism" detailed enough to
answer these questions.

3.1.1 Notification Concerns

A major concern with notification is scalability. For example, it is not
practical to assume that a content provider is interested in receiving a
notification for every HTTP response sent out. As such, a mechanism for
explicit request of notification May be required.


Why is it not practical?! Some content providers would love to know exactly
what their clients are doing with their content. They would be willing to
double server capacity to handle the load.

"Not scalable" usually implies non-linear (hopefully exponential) growth with
the number of messages or notification-generation points. You need to show
such growth (or something similar) if you want to play the scalability card.
What does not scale and when?

Privacy is another concern. Maybe a user doesn't want to reveal to any content
provider all the OPES services that have been applied on her behalf. For
example, why should every content provider know what exact virus scanner a
user is using?


Consider rephrasing to something like this:

    End point privacy is a concern. An end user may consider information
    about OPES services applied on her behalf as private.  For example, if
    translation for braille device has been applied, it can be concluded
    that the user is having eyesight problems; such information may be
    misused if the user is applying for a job online. Similarly, a content
    provider may consider information about its OPES services private.
    For example, use of a specific OPES intermediary by a high traffic
    volume site may indicate business alliances that have not been publicly
    announced yet.

Also consider adding something like this:

    Security is a concern. An attacker may benefit from knowledge
    of internal OPES services layout, execution order, software
    versions and other information likely to be present in
    automated notifications.

Also consider adding something like this:

    The level of available details in notifications versus content
    provider interest in supporting notification is a concern.  Experience
    shows that content providers often require very detailed information
    about user actions to be interested in notifications at all. For
    example, Hit Metering protocol (RFC XXX) has been designed to supply
    content providers with proxy cache hit counts, in an effort to reduce
    cache busting behavior which was cause by content providers desire to
    get accurate site "access counts". The Hit Metering protocol is not
    widely deployed today because it turns out that content providers are
    not interested enough in "just hit counts"; only knowing things like
    each client IP addresses, browser versions, or cookies would make
    providers interested enough to support cache hit notifications.  Hit
    Metering experience is very relevant because Hit Metering protocol was
    designed to do for HTTP caching intermediaries what OPES notifications
    are meant to do for OPES intermediaries.

    (We would need to verify the above info with Hit Metering
    authors, but to the best of my knowledge it is correct)

3.2 How to Fulfill Notifications Requirements

IAB consideration (3.1) [] suggests that the overall OPES framework needs to
assist content providers in detecting and responding to client-centric actions
by OPES intermediaries that are deemed inappropriate by the content provider.

This requirement is hard to implement since most client-centric actions happen


What do we mean by "implement"? Write a spec? Code it up? Deploy? Other?
Consider rephrasing to something like:

    To address this requirement directly, one would have to ...

and then finish with a statement that we are addressing it indirectly by
providing tracing mechanisms that assist interested providers in detecting and
responding to inappropriate OPES actions. Say how they assist (you already do
the latter now, below).

_after_ the application message left the content provider(s) and, thus,
notifications cannot be piggy-backed to application messages and have to
travel in the opposite direction of traces.

Note: Need to explain more here.

IAB consideration (3.2) [] can be satisfied by the development of a tracing
facility. In this regard, it is recommended that tracing SHOULD be always-on,
just like HTTP Via headers now. This should eliminate notification as a
separate requirement.


Why not MUST be always-on? We are talking about interoperability here (a
broken intermediary that does not use Via-OPES headers is an interoperability
problem because it cannot be bypassed).

If the OPES end points cooperate then notification can be supported by
tracing. It is recommended that content providers that suspect or experience
difficulties do the following:


Recommended is too strong, IMO. "For example, ..." or "providers could ...",
would be more appropriate.

      1. Check whether requests they receive pass through
        OPES intermediaries. Presence of OPES tracing info
        will determine that. This check is only possible for
        request/response protocols. For other protocols (e.g.,
        broadcast or push), the provider would have to assume
        that OPES intermediaries are involved until proven
        otherwise.

      2. If OPES intermediaries are suspected,
        request OPES traces from potentially affected user(s).
        The trace will be a part of the application message
        received by the user software. If users cooperate,
        the provider(s) have all the information they need.
        If users do not cooperate, the provider(s) cannot
        do much about it (they might be able to deny service
        to uncooperative users in some cases).

      3. Some traces may indicate that more information
        is available by accessing certain resources on the
        specified OPES intermediary or elsewhere. Content
        providers may query for more information in that
        case.

      4. If everything else fails, providers can enforce
         no-adaptation policy using appropriate OPES
         bypass mechanisms and/or end-to-end mechanisms.




4. Requirements for Tracing in an OPES Flow


In [], the IAB has required that the OPES architecture provide tracing and
debugging facilities. From [], the OPES architecture SHOULD assist consumer
application in detecting the behavior of OPES processors and callout servers
to potentially allow them to identify imperfect or compromised operations.

The OPES architecture document [] has addressed these concerns at a higher
level. The architecture requires that tracing be feasible on the OPES flow per
OPES processor using in-band annotation. This requirement provides a
participant with the ability to detect OPES intermediaries in the course of
normal interaction.

4.1 What is traceable?

End OPES points must be able to trace the following:


Consider more accurate "The following entities can be identified in a trace"

1. OPES processors that are involved.
2. OPES services (including callout services) that were performed on a request
or response.


"... performed on an application message"

3. TBD


Also, we need to add MUST/SHOULD/MAY to each traceable entity, I guess.


4.2 Tracing and Trust Domains

Tracing is limited to trust domain. Entities outside of that domain may or may
not see any traces, depending on domain policies or configuration. Therefore,
there is no need for mandatory end-to-end tracing facility. For example, if an
OPES system is on the content provider "side", end-users are not guaranteed
any traces. If an OPES system is working inside end-user domain, the origin
server is not guaranteed any traces related to user requests.


I am not sure about the above. It contradicts our statement that we are
addressing IAB concerns. If there is no trace, we are not.  I think it is
reasonable to say that there MUST be at least one trace entry per "system".
(A trust domain may include several such systems/entities, see the trust
domain definition).

There are two distinct uses of traces. First, is to SHOULD enable the "end
(content producer or consumer) to detect OPES processor presence within end's
trust domain. Such "end" should be able to see a trace entry, but does not
need to be able to interpret it beyond identification of the trust domain(s).

Second, the domain administrator SHOULD be able to take a trace entry
(possibly supplied by an "end? as an opaque string) and interpret it. The
administrator must be able to identify OPES processor(s) involved and may be
able to identify applied adaptation services along with other message-specific
information. That information SHOULD help to explain what OPES agent(s) were
involved and what they did. It may be impractical to provide all the required
information in all cases. This document view a trace record as a hint, as
opposed to an exhaustive audit.

Since the administrators of various trust domains can have various ways of
looking into tracing, they MAY require the choice of freedom in what to put in
trace records and how to format them. Trace records should be easy to extend
beyond basic OPES requirements. Trace management algorithms should treat trace
records as opaque data to the extent possible.

It is not expected that entities in one trust domain to be able to get all
OPES-related feedback from entities in other trust domains. For example, if an
end-user suspects that a served is corrupted by a callout service, there is no
guarantee that the use will be able to identify that service, contact its
owner, or debug it _unless_ the service is within my trust domain. This is no
different from the current situation where it is impossible, in general, to
know the contact person for an application on an origin server that generates
corrupted HTML; and even if the person is known, one should not expect that
person to respond to end-user queries.


The above should have "system" granularity, not "domain" granularity because
there can be different privacy policies within one trust domain.

4.3 In-Band Tracing

The architecture [] states that races must be in-band. This requirement limits
the number of application protocols that OPES can adapt and the amount of
details a trace record can convey.

The set of protocols that can support tracing for OPES Flow must be clearly
documented. The architecture does not prevent implementers of developing
out-of-band protocols and techniques to address the above limitation.


We should not (cannot) document the set of supported protocols directly, IMO.
We should document _requirements_ for application protocols that want to
support OPES traces. This is similar to OCP application bindings.

4.3.1 Tracing information granularity and persistence levels

The information may be:

- message-related, e.g. "virus checking done - passed", "content filtering
applied", "translated from quibbish to danqush". Such information should be
supplied with each message and indicate that specific action was taken. All
data that describes specific actions performed for the message should be
provided with that message, as there is no other way to find message level
details later. OPES application (including OPES processor and all application
modules and callout servers involved) is not supposed to keep volatile
information after request processing is done.

- session related.
TBD

Session level data must be preserved for the duration of the session. OPES
processor is responsible for inserting notifications if session-level
information changes.

Examples of session-related information is "virus checker abcd build 123
enabled", "OPES server id=xyz present".

- log information id. This may be used e.g. for accounting and non-repudiation
purposes. Detailed information referenced by this id may be not available
online but can be retrieved later by some off-line procedure.

- server related persistent information, e.g. "OPES center of authority
<URI>", "privacy policy <URI>". It may be also presented once per session and
it does not change between sessions.

- end-point related data: what profile is activated (profile ID), where to get
profile details, where to set preferences. I'm not sure how far we should go
in this direction.


The above classification seems like a result of protocol
over-engineering to me. Would it be possible to avoid introducing any
classifications/terms until the draft starts actually _using_ them for
a specific purpose?  This will save us a lot of time -- there is no
reason (and it is very difficult) to discuss something that is not
used (yet).

4.4 OCP Support for Tracing

It is the task of an OPES processor to add trace records to application
messages. In this case, OCP protocol is not affected by tracing requirements
for the following reasons:


Either say "If it is the task..." or remove "In this case, " :-)

a) Exclusive assignment simplifies the protocol.

b) There are use cases where callout services adapt payload regardless of the
application protocol in use and leave header adjustment to OPES processor or
other services. For example, think of a generic text translation or image
modification service; such services require payload encoding knowledge but can
be application-independent if OPES processor can supply them with just the
payload.

c) OPES processor is always _able_ to trace its own invocation and service(s)
execution because OPES processor must understand the application protocol.
Assigning these tracing tasks to callout servers is just an optimization in
cases where callout servers manipulate application message headers.

d) May not be able to trace all services that are done at the callout server.

e) It makes OPES compliance checks easier when remote third party callout
servers are used.

f) Servers or services MAY add their own OPES trace records, of course.


I wonder if it is appropriate for the draft to explain the motivation
behind a decision, at such lengths? Should we just state requirements
instead?


4.5 Protocol Binding to Tracing

How tracing is added is application protocol-specific and will be documented
in separate drafts. This work documents what tracing information is required
and some common tracing elements.

5. Security Considerations




I would suggest adding rules from the message below (or something
similar). They are very specific things we can discuss/fix/polish, and
they actually shape the conventions/intent of many tracing draft sections.
The draft should be mostly about specific requirements, not our motivation
or reasoning about what we might do and what design alternatives we have
available.
        http://www.imc.org/ietf-openproxy/mail-archive/msg01875.html
Please note that I am not saying the above rules are perfect! I am just saying
we need more specific "bones" to grow the draft "meat" around, or we will
never exit the jelly state.

Thank you,

Alex.