RE: Tracing Draft version-0004082003

ALex,

thanks
We will fix and resubmit

abbie

-----Original Message-----
From: Alex Rousskov [mailto:rousskov(_at_)measurement-factory(_dot_)com] 
Sent: Wednesday, April 09, 2003 2:17 AM
To: ietf-openproxy(_at_)imc(_dot_)org
Subject: Re: Tracing Draft version-0004082003



On Tue, 8 Apr 2003, Abbie Barbir wrote:

Attached is the -00 version of the tracing draft.
Please keep in mind it is work in progress.
Feedback is required.


Abbie,

      This is a great start, especially given a very short 
time you had to write this first version of the draft! 
Specific comments are inlined. I did not review the overall 
structure of the draft because I think it is premature to do 
that (need more "meat" and it is not yet clear whether some 
of the current sections are going to stay).


  N.B. Pease format using 72 character line length if possible -- it
       makes it easier (for some of us) to quote and comment. 
Thank you.

                      OPES tracing facility


How about just "OPES Tracing"? Does "Facility" mean something 
specific? Will there be other (non "facility") OPES tracing drafts?

                      draft-ietf-opes-tracing

1. Introduction

The Open Pluggable Edge Services (OPES) architecture enables 
cooperative application services (OPES services) between a data 
provider, a data consumer, and zero or more OPES processors.  The 
application services under consideration analyze and possibly 
transform application-level messages exchanged between the data 
provider and the data consumer.

The execution of such services is governed by a set of

rules installed

on the OPES processor.  The rules enforcement can trigger the 
execution of service applications local to the OPES processor.

Alternatively, the OPES processor can distribute the

responsibility of

service execution by communicating and collaborating with

one or more

remote callout servers. As described in [], an OPES processor 
communicates with and invokes services on a callout server

by using a

callout protocol.

In [], the IAB has required the OPES working group to

support tracing

and notification. This document addresses these IAB requirements.


IAB (RFC 3238) does not require support of anything. It lists 
considerations that the WG should _address_:

   "The purpose of this document is not to recommend specific 
solutions
   for OPES, or even to mandate specific functional requirements....
   Instead, these are recommendations on issues
   that any OPES solutions standardized in the IETF should be required
   to address, similar to the "Security Considerations" currently
   required in IETF documents [RFC2316].  As an example, one way to
   address security issues is to show that appropriate security
   mechanisms have been provided in the protocol, and another way to
   address security issues is to demonstrate that no security issues
   apply to this particular protocol.

There is a huge difference between "requiring support" and 
enumerating "issues that OPES solutions should be required to 
address". Let's not shoot ourselves in the foot. :-)

Also, RFC 3238 does not contain the word "trace" or 
"tracing", just "notification".

I would suggest saying something like this:

    IAB has required OPES solutions to address end user and
    content provider notification concerns. This document
    specifies tracing mechanisms that address those concerns.

It would be nice to explain somewhere why we are not calling 
this document OPES Notification [Facility].

The document examines the effect of tracing and notification 
requirements on OPES architecture and callout protocol []. In 
particular, the work identifies traceable entities in an

OPES flow and

how this information is relayed to end points.


what information?

As per the architecture document [], there is a requirement of 
relaying tracing information in-band. The document investigate this 
possibility and discusses possible methods that could be used to 
detect faulty OPES processors by end points on an OPES flow.


What about faulty callout servers? Do we have a term that 
describes OPES system (processor + callout servers + whatever 
else is out there that is OPES-related)?

The document is organized as follows: Section 2 considers ?

Section 3?

etc.


2. Basic Definitions

- REFERENCE POINT - a reference that may be used out-of-band to
  perform a specific function.

  An example may be URI for the privacy policy, center of authority
  URI, server address, etc. Usually no protocol is provided to
  access the reference point.

- INFORMATION POINT - implies presence of the protocol to access
  detailed information at this point. Example may be URI to get
  a certificate for virus checker or content filter, examine
  and set profile setting and active preferences.

- IDENTIFIER - provides a unique binding to detailed persistent
  information. For example "transformation-applied : fe123" gives a
  participant ability to enquire (and maybe cache) details of the
  transformation fe123. Use of such (opaque) identifiers does not
  require prior knowledge and does not create a burden of storing
  additional information - this is just a tag for persistent
  information (not message-specific).


The above classification seems like a result of protocol 
over-engineering to me. Would it be possible to avoid 
introducing any classifications/terms until the draft starts 
actually _using_ them for a specific purpose?  This will save 
us a lot of time -- there is no reason (and it is very 
difficult) to discuss something that is not used (yet).

3. Requirements for Notification in an OPES Flow


This section takes a look at the IAB requirements (3.1) and

(3.2) and

how they relate to notification

3.1 Notification Requirements

There are requirements on the architecture [] to assist content 
provider applications in detecting and responding to data consumer 
applications actions by OPES intermediaries that are deemed 
inappropriate by the content provider. This is referred to as 
notification.

In general, notification goes in opposite direction of tracing and 
cannot be attached to application messages that it notifies about.


If we compare notification with tracing like that, we should 
talk about/define tracing first and only then provide a comparison.

An "opposite direction" illustration (figure) would be nice here!

This can be done


"This has to be done" ?

out-band and may require the development of a new protocol. In 
general, this opposite-direction, outside-of-message scheme is 
difficult to support.


What does it mean "difficult to support"? Consider removing 
that sentence. (it's OK in a conversation, but not in a spec) 
This text is of great importance because we are, essentially, 
saying that the "ideal" scheme that IAB folks envisioned is 
not practical. We need to be as specific as possible here.

NOTE: When would a content provider issue such request?


What request?

How would such
mechanism be used?  Randomly, or on a statistical basis?

Or manually?

Is such a scheme of practical relevance?


In the above, there is no definition of the "mechanism" 
detailed enough to answer these questions.

3.1.1 Notification Concerns

A major concern with notification is scalability. For

example, it is

not practical to assume that a content provider is interested in 
receiving a notification for every HTTP response sent out.

As such, a

mechanism for explicit request of notification May be required.


Why is it not practical?! Some content providers would love 
to know exactly what their clients are doing with their 
content. They would be willing to double server capacity to 
handle the load.

"Not scalable" usually implies non-linear (hopefully 
exponential) growth with the number of messages or 
notification-generation points. You need to show such growth 
(or something similar) if you want to play the scalability 
card. What does not scale and when?

Privacy is another concern. Maybe a user doesn't want to

reveal to any

content provider all the OPES services that have been

applied on her

behalf. For example, why should every content provider know

what exact

virus scanner a user is using?


Consider rephrasing to something like this:

    End point privacy is a concern. An end user may consider 
information
    about OPES services applied on her behalf as private.  
For example, if
    translation for braille device has been applied, it can 
be concluded
    that the user is having eyesight problems; such information may be
    misused if the user is applying for a job online. 
Similarly, a content
    provider may consider information about its OPES services private.
    For example, use of a specific OPES intermediary by a high traffic
    volume site may indicate business alliances that have not 
been publicly
    announced yet.

Also consider adding something like this:

    Security is a concern. An attacker may benefit from knowledge
    of internal OPES services layout, execution order, software
    versions and other information likely to be present in
    automated notifications.

Also consider adding something like this:

    The level of available details in notifications versus content
    provider interest in supporting notification is a 
concern.  Experience
    shows that content providers often require very detailed 
information
    about user actions to be interested in notifications at all. For
    example, Hit Metering protocol (RFC XXX) has been 
designed to supply
    content providers with proxy cache hit counts, in an 
effort to reduce
    cache busting behavior which was cause by content 
providers desire to
    get accurate site "access counts". The Hit Metering 
protocol is not
    widely deployed today because it turns out that content 
providers are
    not interested enough in "just hit counts"; only knowing 
things like
    each client IP addresses, browser versions, or cookies would make
    providers interested enough to support cache hit 
notifications.  Hit
    Metering experience is very relevant because Hit Metering 
protocol was
    designed to do for HTTP caching intermediaries what OPES 
notifications
    are meant to do for OPES intermediaries.

    (We would need to verify the above info with Hit Metering
    authors, but to the best of my knowledge it is correct)

3.2 How to Fulfill Notifications Requirements

IAB consideration (3.1) [] suggests that the overall OPES framework 
needs to assist content providers in detecting and responding to 
client-centric actions by OPES intermediaries that are deemed 
inappropriate by the content provider.

This requirement is hard to implement since most client-centric 
actions happen


What do we mean by "implement"? Write a spec? Code it up? 
Deploy? Other? Consider rephrasing to something like:

    To address this requirement directly, one would have to ...

and then finish with a statement that we are addressing it 
indirectly by providing tracing mechanisms that assist 
interested providers in detecting and responding to 
inappropriate OPES actions. Say how they assist (you already 
do the latter now, below).

_after_ the application message left the content provider(s) and, 
thus, notifications cannot be piggy-backed to application

messages and

have to travel in the opposite direction of traces.

Note: Need to explain more here.

IAB consideration (3.2) [] can be satisfied by the development of a 
tracing facility. In this regard, it is recommended that tracing 
SHOULD be always-on, just like HTTP Via headers now. This should 
eliminate notification as a separate requirement.


Why not MUST be always-on? We are talking about 
interoperability here (a broken intermediary that does not 
use Via-OPES headers is an interoperability problem because 
it cannot be bypassed).

If the OPES end points cooperate then notification can be

supported by

tracing. It is recommended that content providers that suspect or 
experience difficulties do the following:


Recommended is too strong, IMO. "For example, ..." or 
"providers could ...", would be more appropriate.

    1. Check whether requests they receive pass through
      OPES intermediaries. Presence of OPES tracing info
      will determine that. This check is only possible for
      request/response protocols. For other protocols (e.g.,
      broadcast or push), the provider would have to assume
      that OPES intermediaries are involved until proven
      otherwise.

    2. If OPES intermediaries are suspected,
      request OPES traces from potentially affected user(s).
      The trace will be a part of the application message
      received by the user software. If users cooperate,
      the provider(s) have all the information they need.
      If users do not cooperate, the provider(s) cannot
      do much about it (they might be able to deny service
      to uncooperative users in some cases).

    3. Some traces may indicate that more information
      is available by accessing certain resources on the
      specified OPES intermediary or elsewhere. Content
      providers may query for more information in that
      case.

    4. If everything else fails, providers can enforce
       no-adaptation policy using appropriate OPES
       bypass mechanisms and/or end-to-end mechanisms.




4. Requirements for Tracing in an OPES Flow


In [], the IAB has required that the OPES architecture

provide tracing

and debugging facilities. From [], the OPES architecture

SHOULD assist

consumer application in detecting the behavior of OPES

processors and

callout servers to potentially allow them to identify imperfect or 
compromised operations.

The OPES architecture document [] has addressed these concerns at a 
higher level. The architecture requires that tracing be feasible on 
the OPES flow per OPES processor using in-band annotation. This 
requirement provides a participant with the ability to detect OPES 
intermediaries in the course of normal interaction.

4.1 What is traceable?

End OPES points must be able to trace the following:


Consider more accurate "The following entities can be 
identified in a trace"

1. OPES processors that are involved.
2. OPES services (including callout services) that were

performed on a

request or response.


"... performed on an application message"

3. TBD


Also, we need to add MUST/SHOULD/MAY to each traceable 
entity, I guess.


4.2 Tracing and Trust Domains

Tracing is limited to trust domain. Entities outside of that domain 
may or may not see any traces, depending on domain policies or 
configuration. Therefore, there is no need for mandatory end-to-end 
tracing facility. For example, if an OPES system is on the content 
provider "side", end-users are not guaranteed any traces.

If an OPES

system is working inside end-user domain, the origin server is not 
guaranteed any traces related to user requests.


I am not sure about the above. It contradicts our statement 
that we are addressing IAB concerns. If there is no trace, we 
are not.  I think it is reasonable to say that there MUST be 
at least one trace entry per "system". (A trust domain may 
include several such systems/entities, see the trust domain 
definition).

There are two distinct uses of traces. First, is to SHOULD

enable the

"end (content producer or consumer) to detect OPES

processor presence

within end's trust domain. Such "end" should be able to see a trace 
entry, but does not need to be able to interpret it beyond 
identification of the trust domain(s).

Second, the domain administrator SHOULD be able to take a

trace entry

(possibly supplied by an "end? as an opaque string) and

interpret it.

The administrator must be able to identify OPES

processor(s) involved

and may be able to identify applied adaptation services along with 
other message-specific information. That information SHOULD help to 
explain what OPES agent(s) were involved and what they did.

It may be

impractical to provide all the required information in all

cases. This

document view a trace record as a hint, as opposed to an exhaustive 
audit.

Since the administrators of various trust domains can have various 
ways of looking into tracing, they MAY require the choice

of freedom

in what to put in trace records and how to format them.

Trace records

should be easy to extend beyond basic OPES requirements. Trace 
management algorithms should treat trace records as opaque

data to the

extent possible.

It is not expected that entities in one trust domain to be

able to get

all OPES-related feedback from entities in other trust domains. For 
example, if an end-user suspects that a served is corrupted by a 
callout service, there is no guarantee that the use will be able to 
identify that service, contact its owner, or debug it _unless_ the 
service is within my trust domain. This is no different from the 
current situation where it is impossible, in general, to know the 
contact person for an application on an origin server that

generates

corrupted HTML; and even if the person is known, one should

not expect

that person to respond to end-user queries.


The above should have "system" granularity, not "domain" 
granularity because there can be different privacy policies 
within one trust domain.

4.3 In-Band Tracing

The architecture [] states that races must be in-band. This 
requirement limits the number of application protocols that

OPES can

adapt and the amount of details a trace record can convey.

The set of protocols that can support tracing for OPES Flow must be 
clearly documented. The architecture does not prevent

implementers of

developing out-of-band protocols and techniques to address

the above

limitation.


We should not (cannot) document the set of supported 
protocols directly, IMO. We should document _requirements_ 
for application protocols that want to support OPES traces. 
This is similar to OCP application bindings.

4.3.1 Tracing information granularity and persistence levels

The information may be:

- message-related, e.g. "virus checking done - passed", "content 
filtering applied", "translated from quibbish to danqush". Such 
information should be supplied with each message and indicate that 
specific action was taken. All data that describes specific actions 
performed for the message should be provided with that message, as 
there is no other way to find message level details later. OPES 
application (including OPES processor and all application

modules and

callout servers involved) is not supposed to keep volatile

information

after request processing is done.

- session related.
TBD

Session level data must be preserved for the duration of

the session.

OPES processor is responsible for inserting notifications if 
session-level information changes.

Examples of session-related information is "virus checker

abcd build

123 enabled", "OPES server id=xyz present".

- log information id. This may be used e.g. for accounting and 
non-repudiation purposes. Detailed information referenced

by this id

may be not available online but can be retrieved later by some 
off-line procedure.

- server related persistent information, e.g. "OPES center of 
authority <URI>", "privacy policy <URI>". It may be also presented 
once per session and it does not change between sessions.

- end-point related data: what profile is activated (profile ID), 
where to get profile details, where to set preferences. I'm

not sure

how far we should go in this direction.


The above classification seems like a result of protocol 
over-engineering to me. Would it be possible to avoid 
introducing any classifications/terms until the draft starts 
actually _using_ them for a specific purpose?  This will save 
us a lot of time -- there is no reason (and it is very 
difficult) to discuss something that is not used (yet).

4.4 OCP Support for Tracing

It is the task of an OPES processor to add trace records to 
application messages. In this case, OCP protocol is not affected by 
tracing requirements for the following reasons:


Either say "If it is the task..." or remove "In this case, " :-)

a) Exclusive assignment simplifies the protocol.

b) There are use cases where callout services adapt payload

regardless

of the application protocol in use and leave header

adjustment to OPES

processor or other services. For example, think of a generic text 
translation or image modification service; such services require 
payload encoding knowledge but can be

application-independent if OPES

processor can supply them with just the payload.

c) OPES processor is always _able_ to trace its own invocation and 
service(s) execution because OPES processor must understand the 
application protocol. Assigning these tracing tasks to

callout servers

is just an optimization in cases where callout servers manipulate 
application message headers.

d) May not be able to trace all services that are done at

the callout

server.

e) It makes OPES compliance checks easier when remote third party 
callout servers are used.

f) Servers or services MAY add their own OPES trace records, of 
course.


I wonder if it is appropriate for the draft to explain the 
motivation behind a decision, at such lengths? Should we just 
state requirements instead?


4.5 Protocol Binding to Tracing

How tracing is added is application protocol-specific and will be 
documented in separate drafts. This work documents what tracing 
information is required and some common tracing elements.

5. Security Considerations




I would suggest adding rules from the message below (or 
something similar). They are very specific things we can 
discuss/fix/polish, and they actually shape the 
conventions/intent of many tracing draft sections. The draft 
should be mostly about specific requirements, not our 
motivation or reasoning about what we might do and what 
design alternatives we have available.
      http://www.imc.org/ietf-openproxy/mail-archive/msg01875.html
Please note that I am not saying the above rules are perfect! 
I am just saying we need more specific "bones" to grow the 
draft "meat" around, or we will never exit the jelly state.

Thank you,

Alex.