Re: Comments on ICAP I-D

From: "Issac Goldstand" <neoi(_at_)writeme(_dot_)com>
To: <ietf-openproxy(_at_)imc(_dot_)org>
Subject: Comments on ICAP I-D
Date: Thu, 15 Feb 2001 13:24:22 +0200
MIME-Version: 1.0
Content-Type: text/plain;
      charset="windows-1255"
Content-Transfer-Encoding: 7bit
Sender: owner-ietf-openproxy(_at_)mail(_dot_)imc(_dot_)org
Precedence: bulk

To  whom it may concern (J. Elson et al.) :

At Jeremy's request, I am forwarding this to the entire mailing list.

After reading your proposed draft for the ICAP standard, I came across a few
points that I thought might be worth mentioning.  Before I start I want to
mention that in these comments, I will refer only to HTTP requests/responses
for the sake of simplicity;  this does not mean that they do not apply to
other protocols.

Firstly, the draft mentions modifying HTTP request and response headers, but
what about modifying POST request payloads?  For example, let us say that
some ICAP server has a program that can act as a preprocessor for certain
types of forms; let's say that there is some process.cgi script on HTTP
server foo.bar that takes, as part of its arguments, a number of email
addresses.  Rather than requiring this, and similar, scripts to verify the
existence of these email addresses, we can have an ICAP server,
icap.server.net, which has a ValidateEmail service running which will remove
invalid email addresses from the submission, by parsing the POST (which it
will obviously have to know how to read, but the same applies for HTTP
response payloads, so I think this is still valid in that respect) and
removing invalid mail addresses, before forwarding it to the HTTP server.
Additionally, it can return an error to the client if, let's say, it removes
all the addresses, rendering the form incomplete.  This error can be
returned to the HTTP client in the same way that errors coming from request
header modifications are returned.


I believe this is allowed by the spec, and we had extensive
discussions about how modification of POST requests is explicitly
allowed.  The Request line, be it "GET" or "POST" or "HEAD" or
whatever is sent along with the original ICAP request.  In particular,
for a language translation service we probably need to allow POSTings
to be modified.

I know that the NetApp implementation allows POST _responses_ to be
modified, so that viruses cannot be downloaded as the result of a POSTing.

Secondly: This is just a small superficial modification, but in section 4.2,
you might want to move the lines beginning with "Typical data flow" to the
next page simply for readability purposes.

Thirdly:  In section 5.1, when talking about the reasoning behind the
mandatory use of "chunking", you state:

"Chunking is mandatory for three reasons.  First, efficiency is
important, and the chunked encoding allows both the client and server to
keep the transport-layer connection open for later reuse."

This does not appear to me to be a valid reason for demanding chunking, as
it can easily be accomplished by using the Connection HTTP header.  Perhaps
I misunderstood your intent, but if not, it hardly seems fair to include
this as a _requirement_ for chunking.  Note that I'm not specifically
criticizing the inclusion of this as a reason to USE chunking; but rather
object to its inclusion as a REQUIREMENT (e.g. MAY or SHOULD vs. MUST).


Actually, this is not true because origin servers will running a cgi
script will close the connection in order to indicate the end of the
message.  In our 0.95 implementation at NetApp, this TCP-close was
proxied over to the ICAP service, destroying our hard-won ICAP
connection.  The only other encapsulation that a TCP-close could be
proxied to (efficiently, without storing up the entire response at
needless disk and time bubble in the fetch pipeline) is to proxy it to
a chunked response.

Next, in section 5.3.1, you write:

"Note in particular that the "Transfer-Encoding" option is not allowed.
All ICAP messages MUST use the "chunked" transfer-encoding."

Now it appears me that it is foolish to forbid the use of
"Transfer-Encoding" and just automatically assume "chunked".  What if some
other intermediate service needs this information, and is unaware of ICAP
standards.  Might it not be better to remove the first line, thus implying
that all ICAP requests contain the header "Transfer-Encoding: chunked"?
That still insists on chunked, but now provides other hosts with that bit of
information.


Since everything is chunked in ICAP, "Transfer-Enconding" header is
redundant.  A good protocol has a very high degree of "entropy",
e.g. there is no redundant information.  Since we no longer run on
port 80 - we have port 1344 - it silly to keep any redundant vestiges
of the old the HTTP protocol in ICAP.  this will not be missed.

It was our understanding that most HTTP proxies either reject methods
they do not understand, or else they just connect the two sockets and
stop interpreting the stream.  In both cases, a lack of
"Transfer-Encoding: chunked" would not hinder this behavior.

Below this, you write:
"The Via header MUST be treated the same as in standard HTTP.  ICAP
clients and servers should modify Via as an HTTP proxy would, with
respect to the HTTP message being encapsulated.  The Via header added by
an ICAP server should specify protocol as ICAP/1.0."

And a few paragraphs later, you write:
"Note that, in most applications, it is useful for ICAP clients acting in
their HTTP client roles to add ICAP headers to HTTP requests.  This
notifies the HTTP origin server that it is speaking with a client that
is ICAP-enabled.  Origin servers can use this header to determine that
the client is ICAP-enabled, and modify its response accordingly.  For
example, an origin server may choose not to insert an advertisement into
a page if it knows that a downstream ICAP server can insert the ad
instead.

The additional headers in an HTTP (not ICAP) connection to an origin
server that are needed to support this type of decision are application
specific and beyond the scope of this document.  However, such headers
(if used) SHOULD start with "X-ICAP".  Applications SHOULD include an
"X-ICAP-Version: 1.0" header along with their application-specific
headers."

RFC 2616 (Hypertext Transfer Protocol 1.1) specifies the following format
for the Via header:

     Via =  "Via" ":" 1#( received-protocol received-by [ comment ] )
      received-protocol = [ protocol-name "/" ] protocol-version
      protocol-name     = token
      protocol-version  = token
      received-by       = ( host [ ":" port ] ) | pseudonym
      pseudonym         = token

Furthermore, the RFC goes on to say:
"Comments MAY be used in the Via header field to identify the software
 of the recipient proxy or gateway, analogous to the User-Agent and
 Server header fields."

Now, it seems to me that it might be a bit easier to simply have all ICAP
aware clients add their ICAP abilities to the Via line, in the provided
comments area. That is, after all, what the Via line is there for.  Perhaps
the last paragraph should be removed, and replaced by the line:

"Therefore, all such ICAP clients SHOULD include information about their
ICAP capabilities in the Via field.  If included, the header SHOULD include
the ICAP version supported by the client (e.g. ICAP/1.0)."

The only situation that this doesn't work for is if the original HTTP client
is also the ICAP client, but the draft doesn't seem to take that into
consideration anywhere else...


The Via: header is much like the "Received-From:" header - it is
computer generated and human-interpreted.  Since it is unlikely that
any automatons could make use of this header (just as "Received-From:"
is never used except during human debugging of mail routing loops), we
should be free to insert a header that conforms to the syntax, and
spirit, (but not the exact semantics) of the HTTP "Via" header.  I
would be in favor of adding a comment to indicate that a proxy was an
ICAP proxy.


In section 5.3.2, you go on to list request-specific headers allowed in ICAP
requests.  Among them is the Host header. However, it is unclear at first
glance whether the expected contents of the Host field should be the
hostname of the ICAP server or that of the HTTP server to whom the original
request was made.  I assume that the former is true, but it should be
explicitly stated in the draft to avoid possible confusion.

In section 5.7.3, in the second example, a very dangerous use of ICAP is
brought to light.  In this example, an ICAP server adds the image/gif format
to the Accept line of an HTTP request.  This shows an extremely dangerous
application of ICAP.  The purpose of the original HTTP client's headers is
to provide the HTTP server with information of what kind of information it
can accept.  The setting of the headers should be allowed to be done ONLY by
the original HTTP client.  Therefore, while an ICAP server can safely REMOVE
information from such headers, it should not modify them in any other way
(with the exception of REDUCING the q [qvalue]  parameter where applicable),
and should certainly not add fields or paramters to them.  This can lead to
the original HTTP client recieving information that it (or worse, the
end-user) doesn't know how to interpret, which I think is the ultimate
"no-no".  The general problem, therefore, is when an ICAP server modifies
headers that provide information about the client's abilities to the HTTP
server.

Headers that fall under this risk include:

Accept
Accept-Charset
Accept-Encoding
Accept-Language
Accept-Range
Connection
Expect
TE
Upgrade

Other fields that could theoretically become problematic when modified
include:

If-Match
If-Modified-Since
If-None-Match
If-Range
If-Unmodified-Since
Max-Forwards
User-Agent

The distinction I made when dividing these two groups are that the first
group is headers which, if modified, could cause the client to recieve
information that it cannot process, while the second group are headers which
will, at worst, result in a possibly unexpected response (which it will
_understand_ in any case).

Finally, on the same note, it is advisable for ICAP servers to NOT modify a
Warning header, should they come across one.



People get upset frequently about the difference between what "Is
possible" and what "Is useful, or intelligent" to do with ICAP.  Our
interest as protocol designers is to just stick to what is possible.
Since we are competing with existing practice, where applications are
embedded in dedicated proxy engines, and _anything_ is possible, we
are not inclined to place or recommend restrictions in the protocol.
However, the example is probably ill-chosen and should probably be
redesigned, since it does something un-useful.

One last point, which I'll be brief about, is how non-ICAP-aware HTTP
proxies should treat ICAP requests/responses, but I'm really not sure that
it's really applicable, so I'll refrain from addressing it specifically
now...

I think that about wraps it up for now.  I think that there's a lot of
promise to the idea, and there seems to be a great interest just now in
modifying parts of requests rather than the whole (see also
http://www.ietf.org/internet-drafts/draft-mogul-http-delta-07.txt which is
currently in Last Call for Proposed Standard).  I'd also appreciate being
added to any mailing lists related with ICAP and look forward to continue to
assist the protocol's development.

Sincerely,
  Issac Goldstand

Internet is a wonderful mechanism for making a fool of
yourself in front of a very large audience.
  --Anonymous

Moving the mouse won't get you into trouble...  Clicking it might.
  --Anonymous

PGP Key 0xE0FA561B - Fingerprint:
7E18 C018 D623 A57B 7F37 D902 8C84 7675 E0FA 561B



Don Gillies - gillies(_at_)netapp(_dot_)com - Network Appliance, Inc.
Adjunct Professor of Computer Engineering, UBC, Vancouver BC Canada V6T 1Z4