Re: Tracing Draft version-0004082003


At 08:17 09/04/03, Alex Rousskov wrote:

On Tue, 8 Apr 2003, Abbie Barbir wrote:
Abbie,
        This is a great start, especially given a very short time you
had to write this first version of the draft! Specific comments are
inlined.


Agree.

 I did not review the overall structure of the draft because
I think it is premature to do that (need more "meat" and it is not
yet clear whether some of the current sections are going to stay).


This is the problem with your way of thinking from the leaf to the
trunc. Quite confusing to me sometimes.

>                       OPES tracing facility

How about just "OPES Tracing"? Does "Facility" mean something specific?
Will there be other (non "facility") OPES tracing drafts?


You talk about notification as specific and sometime separate from a blocked
tacing. Also IAB expects a paper on tracing.

>                       draft-ietf-opes-tracing
>
> 1. Introduction
>
> The Open Pluggable Edge Services (OPES) architecture enables cooperative

> application services (OPES services) between a data provider, a dataconsumer,

> and zero or more OPES processors.  The application services under
> consideration analyze and possibly transform application-level messages
> exchanged between the data provider and the data consumer.
>

> The execution of such services is governed by a set of rules installedon the

> OPES processor.  The rules enforcement can trigger the execution of service
> applications local to the OPES processor.
>

> Alternatively, the OPES processor can distribute the responsibility ofservice> execution by communicating and collaborating with one or more remotecallout> servers. As described in [], an OPES processor communicates with andinvokes

> services on a callout server by using a callout protocol.
>
> In [], the IAB has required the OPES working group to support tracing and
> notification. This document addresses these IAB requirements.

IAB (RFC 3238) does not require support of anything. It lists considerations
that the WG should _address_:

   "The purpose of this document is not to recommend specific solutions
   for OPES, or even to mandate specific functional requirements....
   Instead, these are recommendations on issues
   that any OPES solutions standardized in the IETF should be required
   to address, similar to the "Security Considerations" currently
   required in IETF documents [RFC2316].  As an example, one way to
   address security issues is to show that appropriate security
   mechanisms have been provided in the protocol, and another way to
   address security issues is to demonstrate that no security issues
   apply to this particular protocol.

There is a huge difference between "requiring support" and enumerating "issues
that OPES solutions should be required to address". Let's not shoot ourselves
in the foot. :-)

Also, RFC 3238 does not contain the word "trace" or "tracing", just
"notification".

I would suggest saying something like this:

    IAB has required OPES solutions to address end user and
    content provider notification concerns. This document
    specifies tracing mechanisms that address those concerns.

It would be nice to explain somewhere why we are not calling this
document OPES Notification [Facility].

> The document examines the effect of tracing and notification
> requirements on OPES architecture and callout protocol []. In
> particular, the work identifies traceable entities in an OPES flow
> and how this information is relayed to end points.

what information?

> As per the architecture document [], there is a requirement of
> relaying tracing information in-band. The document investigate this
> possibility and discusses possible methods that could be used to
> detect faulty OPES processors by end points on an OPES flow.

What about faulty callout servers? Do we have a term that describes
OPES system (processor + callout servers + whatever else is out there
that is OPES-related)?


In this Abbie's context, with "OPES Processor" being actually used
as a word for the whole system manager, I fully accept the wording.
OPES Processor can be a process in a box or the supervisor of a
large buch of callout/non-call out servers.

> The document is organized as follows: Section 2 considers ? Section
> 3? etc.
>
>
> 2. Basic Definitions
>
> - REFERENCE POINT - a reference that may be used out-of-band to
>   perform a specific function.
>
>   An example may be URI for the privacy policy, center of authority
>   URI, server address, etc. Usually no protocol is provided to
>   access the reference point.
>
> - INFORMATION POINT - implies presence of the protocol to access
>   detailed information at this point. Example may be URI to get
>   a certificate for virus checker or content filter, examine
>   and set profile setting and active preferences.
>
> - IDENTIFIER - provides a unique binding to detailed persistent
>   information. For example "transformation-applied : fe123" gives a
>   participant ability to enquire (and maybe cache) details of the
>   transformation fe123. Use of such (opaque) identifiers does not
>   require prior knowledge and does not create a burden of storing
>   additional information - this is just a tag for persistent
>   information (not message-specific).

The above classification seems like a result of protocol
over-engineering to me. Would it be possible to avoid introducing any
classifications/terms until the draft starts actually _using_ them for
a specific purpose?  This will save us a lot of time -- there is no
reason (and it is very difficult) to discuss something that is not
used (yet).


Leaf to trunc (Basica vs C) approach. I do support that initial lexical
referencing. Skip the reading if you dislike it and refere to it further
on. Reading has not to be linear. Keep the info of where is the
definition, so you dont have to master it first shot. Makes life easier,
reading simpler and debates quicker. Our main problem here is word
definition, and where they fit in a commonly accepted model.

> 3. Requirements for Notification in an OPES Flow
>

> This section takes a look at the IAB requirements (3.1) and (3.2) andhow they

> relate to notification
>
> 3.1 Notification Requirements
>
> There are requirements on the architecture [] to assist content provider

> applications in detecting and responding to data consumer applicationsactions> by OPES intermediaries that are deemed inappropriate by the contentprovider.

> This is referred to as notification.
>

> In general, notification goes in opposite direction of tracing andcannot be

> attached to application messages that it notifies about.

If we compare notification with tracing like that, we should talk
about/define tracing first and only then provide a comparison.

An "opposite direction" illustration (figure) would be nice here!

> This can be done

"This has to be done" ?

> out-band and may require the development of a new protocol. In general,this

> opposite-direction, outside-of-message scheme is difficult to support.

What does it mean "difficult to support"? Consider removing that
sentence. (it's OK in a conversation, but not in a spec) This text is
of great importance because we are, essentially, saying that the
"ideal" scheme that IAB folks envisioned is not practical. We need to
be as specific as possible here.

> NOTE: When would a content provider issue such request?

What request?

> How would such

> mechanism be used? Randomly, or on a statistical basis? Or manually?Is such

> a scheme of practical relevance?

In the above, there is no definition of the "mechanism" detailed enough to
answer these questions.

> 3.1.1 Notification Concerns
>
> A major concern with notification is scalability. For example, it is not
> practical to assume that a content provider is interested in receiving a
> notification for every HTTP response sent out. As such, a mechanism for
> explicit request of notification May be required.

Why is it not practical?! Some content providers would love to know exactly
what their clients are doing with their content. They would be willing to
double server capacity to handle the load.

"Not scalable" usually implies non-linear (hopefully exponential) growth with
the number of messages or notification-generation points. You need to show
such growth (or something similar) if you want to play the scalability card.
What does not scale and when?

> Privacy is another concern. Maybe a user doesn't want to reveal to anycontent

> provider all the OPES services that have been applied on her behalf. For
> example, why should every content provider know what exact virus scanner a
> user is using?

Consider rephrasing to something like this:

    End point privacy is a concern. An end user may consider information
    about OPES services applied on her behalf as private.  For example, if
    translation for braille device has been applied, it can be concluded
    that the user is having eyesight problems; such information may be
    misused if the user is applying for a job online. Similarly, a content
    provider may consider information about its OPES services private.
    For example, use of a specific OPES intermediary by a high traffic
    volume site may indicate business alliances that have not been publicly
    announced yet.

Also consider adding something like this:

    Security is a concern. An attacker may benefit from knowledge
    of internal OPES services layout, execution order, software
    versions and other information likely to be present in
    automated notifications.

Also consider adding something like this:

    The level of available details in notifications versus content
    provider interest in supporting notification is a concern.  Experience
    shows that content providers often require very detailed information
    about user actions to be interested in notifications at all. For
    example, Hit Metering protocol (RFC XXX) has been designed to supply
    content providers with proxy cache hit counts, in an effort to reduce
    cache busting behavior which was cause by content providers desire to
    get accurate site "access counts". The Hit Metering protocol is not
    widely deployed today because it turns out that content providers are
    not interested enough in "just hit counts"; only knowing things like
    each client IP addresses, browser versions, or cookies would make
    providers interested enough to support cache hit notifications.  Hit
    Metering experience is very relevant because Hit Metering protocol was
    designed to do for HTTP caching intermediaries what OPES notifications
    are meant to do for OPES intermediaries.

    (We would need to verify the above info with Hit Metering
    authors, but to the best of my knowledge it is correct)


Is it necessary to so exnesively motivate a decision. It would be helpfull
in foot notes. Should notes be permited?

> 3.2 How to Fulfill Notifications Requirements
>

> IAB consideration (3.1) [] suggests that the overall OPES frameworkneeds to> assist content providers in detecting and responding to client-centricactions> by OPES intermediaries that are deemed inappropriate by the contentprovider.

> This requirement is hard to implement since most client-centric actionshappen


What do we mean by "implement"? Write a spec? Code it up? Deploy? Other?
Consider rephrasing to something like:

    To address this requirement directly, one would have to ...

and then finish with a statement that we are addressing it indirectly by
providing tracing mechanisms that assist interested providers in detecting and
responding to inappropriate OPES actions. Say how they assist (you already do
the latter now, below).

> _after_ the application message left the content provider(s) and, thus,
> notifications cannot be piggy-backed to application messages and have to
> travel in the opposite direction of traces.
>
> Note: Need to explain more here.

> IAB consideration (3.2) [] can be satisfied by the development of a tracing

> facility. In this regard, it is recommended that tracing SHOULD bealways-on,

> just like HTTP Via headers now. This should eliminate notification as a
> separate requirement.

Why not MUST be always-on? We are talking about interoperability here (a
broken intermediary that does not use Via-OPES headers is an interoperability
problem because it cannot be bypassed).


If HTTP Via is SHOULD it can only be SHOULD.

> If the OPES end points cooperate then notification can be supported by

> tracing. It is recommended that content providers that suspect orexperience

> difficulties do the following:

Recommended is too strong, IMO. "For example, ..." or "providers could ...",
would be more appropriate.

>       1. Check whether requests they receive pass through
>         OPES intermediaries. Presence of OPES tracing info
>         will determine that. This check is only possible for
>         request/response protocols. For other protocols (e.g.,
>         broadcast or push), the provider would have to assume
>         that OPES intermediaries are involved until proven
>         otherwise.
>
>       2. If OPES intermediaries are suspected,
>         request OPES traces from potentially affected user(s).
>         The trace will be a part of the application message
>         received by the user software. If users cooperate,
>         the provider(s) have all the information they need.
>         If users do not cooperate, the provider(s) cannot
>         do much about it (they might be able to deny service
>         to uncooperative users in some cases).
>
>       3. Some traces may indicate that more information
>         is available by accessing certain resources on the
>         specified OPES intermediary or elsewhere. Content
>         providers may query for more information in that
>         case.
>
>       4. If everything else fails, providers can enforce
>          no-adaptation policy using appropriate OPES
>          bypass mechanisms and/or end-to-end mechanisms.
>
> 4. Requirements for Tracing in an OPES Flow
>
>
> In [], the IAB has required that the OPES architecture provide tracing and
> debugging facilities. From [], the OPES architecture SHOULD assist consumer

> application in detecting the behavior of OPES processors and calloutservers

> to potentially allow them to identify imperfect or compromised operations.
>
> The OPES architecture document [] has addressed these concerns at a higher

> level. The architecture requires that tracing be feasible on the OPESflow per

> OPES processor using in-band annotation. This requirement provides a
> participant with the ability to detect OPES intermediaries in the course of
> normal interaction.
>
> 4.1 What is traceable?
>
> End OPES points must be able to trace the following:

Consider more accurate "The following entities can be identified in a trace"

> 1. OPES processors that are involved.

> 2. OPES services (including callout services) that were performed on arequest

> or response.

"... performed on an application message"

> 3. TBD

Also, we need to add MUST/SHOULD/MAY to each traceable entity, I guess.


MAY. to be consitent with your wording.

> 4.2 Tracing and Trust Domains
>

> Tracing is limited to trust domain. Entities outside of that domain mayor may> not see any traces, depending on domain policies or configuration.Therefore,> there is no need for mandatory end-to-end tracing facility. Forexample, if an

> OPES system is on the content provider "side", end-users are not guaranteed
> any traces. If an OPES system is working inside end-user domain, the origin
> server is not guaranteed any traces related to user requests.

I am not sure about the above. It contradicts our statement that we are
addressing IAB concerns. If there is no trace, we are not.  I think it is
reasonable to say that there MUST be at least one trace entry per "system".
(A trust domain may include several such systems/entities, see the trust
domain definition).


I think we are with option to turn it out. Privacy must be permitted.

> There are two distinct uses of traces. First, is to SHOULD enable the "end
> (content producer or consumer) to detect OPES processor presence withinend's
> trust domain. Such "end" should be able to see a trace entry, but does not
> need to be able to interpret it beyond identification of the trustdomain(s).
> Second, the domain administrator SHOULD be able to take a trace entry
> (possibly supplied by an "end? as an opaque string) and interpret it. The
> administrator must be able to identify OPES processor(s) involved andmay be> able to identify applied adaptation services along with othermessage-specific> information. That information SHOULD help to explain what OPES agent(s)were> involved and what they did. It may be impractical to provide all therequired
> information in all cases. This document view a trace record as a hint, as
> opposed to an exhaustive audit.
>
> Since the administrators of various trust domains can have various ways of
> looking into tracing, they MAY require the choice of freedom in what toput in> trace records and how to format them. Trace records should be easy toextend> beyond basic OPES requirements. Trace management algorithms shouldtreat trace
> records as opaque data to the extent possible.

> It is not expected that entities in one trust domain to be able to get all
> OPES-related feedback from entities in other trust domains. Forexample, if an> end-user suspects that a served is corrupted by a callout service,there is no
> guarantee that the use will be able to identify that service, contact its
> owner, or debug it _unless_ the service is within my trust domain. Thisis no
> different from the current situation where it is impossible, in general, to
> know the contact person for an application on an origin server thatgenerates
> corrupted HTML; and even if the person is known, one should not expect that
> person to respond to end-user queries.

The above should have "system" granularity, not "domain" granularity because
there can be different privacy policies within one trust domain.

> 4.3 In-Band Tracing
>
> The architecture [] states that races must be in-band. This requirementlimits
> the number of application protocols that OPES can adapt and the amount of
> details a trace record can convey.
>
> The set of protocols that can support tracing for OPES Flow must be clearly
> documented. The architecture does not prevent implementers of developing
> out-of-band protocols and techniques to address the above limitation.

We should not (cannot) document the set of supported protocols directly, IMO.
We should document _requirements_ for application protocols that want to
support OPES traces. This is similar to OCP application bindings.

> 4.3.1 Tracing information granularity and persistence levels
>
> The information may be:
>
> - message-related, e.g. "virus checking done - passed", "content filtering
> applied", "translated from quibbish to danqush". Such information should be
> supplied with each message and indicate that specific action was taken. All
> data that describes specific actions performed for the message should be
> provided with that message, as there is no other way to find message level
> details later. OPES application (including OPES processor and allapplication
> modules and callout servers involved) is not supposed to keep volatile
> information after request processing is done.
>
> - session related.
> TBD
>
> Session level data must be preserved for the duration of the session. OPES
> processor is responsible for inserting notifications if session-level
> information changes.
>
> Examples of session-related information is "virus checker abcd build 123
> enabled", "OPES server id=xyz present".
>
> - log information id. This may be used e.g. for accounting andnon-repudiation
> purposes. Detailed information referenced by this id may be not available
> online but can be retrieved later by some off-line procedure.
>
> - server related persistent information, e.g. "OPES center of authority
> <URI>", "privacy policy <URI>". It may be also presented once persession and
> it does not change between sessions.
>
> - end-point related data: what profile is activated (profile ID), whereto get> profile details, where to set preferences. I'm not sure how far weshould go
> in this direction.

The above classification seems like a result of protocol
over-engineering to me. Would it be possible to avoid introducing any
classifications/terms until the draft starts actually _using_ them for
a specific purpose?  This will save us a lot of time -- there is no
reason (and it is very difficult) to discuss something that is not
used (yet).


Pasted remark. Same comment.

> 4.4 OCP Support for Tracing
>
> It is the task of an OPES processor to add trace records to application
> messages. In this case, OCP protocol is not affected by tracingrequirements
> for the following reasons:

Either say "If it is the task..." or remove "In this case, " :-)

> a) Exclusive assignment simplifies the protocol.
>
> b) There are use cases where callout services adapt payload regardlessof the> application protocol in use and leave header adjustment to OPESprocessor or
> other services. For example, think of a generic text translation or image
> modification service; such services require payload encoding knowledgebut can
> be application-independent if OPES processor can supply them with just the
> payload.
>
> c) OPES processor is always _able_ to trace its own invocation andservice(s)
> execution because OPES processor must understand the application protocol.
> Assigning these tracing tasks to callout servers is just an optimization in
> cases where callout servers manipulate application message headers.
>
> d) May not be able to trace all services that are done at the calloutserver.
>
> e) It makes OPES compliance checks easier when remote third party callout
> servers are used.
>
> f) Servers or services MAY add their own OPES trace records, of course.

I wonder if it is appropriate for the draft to explain the motivation
behind a decision, at such lengths? Should we just state requirements
instead?


Foot notes?

> 4.5 Protocol Binding to Tracing
>

> How tracing is added is application protocol-specific and will bedocumented> in separate drafts. This work documents what tracing information isrequired

> and some common tracing elements.
>
> 5. Security Considerations

I would suggest adding rules from the message below (or something
similar). They are very specific things we can discuss/fix/polish, and
they actually shape the conventions/intent of many tracing draft sections.
The draft should be mostly about specific requirements, not our motivation
or reasoning about what we might do and what design alternatives we have
available.
        http://www.imc.org/ietf-openproxy/mail-archive/msg01875.html
Please note that I am not saying the above rules are perfect! I am just saying
we need more specific "bones" to grow the draft "meat" around, or we will
never exit the jelly state.


Thank you both for the work achieved.
jfc