RE: OCP header encoding


Alex,

My strong preference is 2.

Reasons:

1. Better chance for acceptance. As you told yourself 
"... ICAP and HTTP folks would feel very comfortable".
And the protocol acceptance depend on those folks, so 
we should help them, not parsers :). 

The main goal of OCP development is to get it widely adopted, 
otherwise it does not matter how optimal the protocol is.

2. You are right turning away binary encoding, additional 
reason is the lack of flexibility - you have to get it absolutely 
right the very first time. Not like I was saying that we 
can not - but there are too many changing requirements and 
unknown factors.

3. Optimized MIME is not radical enough to justify it's 
introduction. One pass through headers is enough anyway. 
Not that big deal, and you can save this pass only on 
"irrelevant message parts". How much of them do you expect 
to see in the average message? This may be essential for 
widely used protocol with wide application area, long 
history and heavy backward compatibility requirements, 
like HTTP. 

OCP is much more focused - it does not carry application 
semantics and it is not intended for deployment by the end 
users. I do not expect to see a lot of irrelevant metadata, 
and again, this is just about one pass through that data.

As for optimization - short aliases you've mentioned in 
another message looks like a very good idea. It helps with 
all headers (not just those one wants to ignore) and may 
result in much bigger overall saving. And the best 
thing in it - it is optional!

Oskar

-----Original Message-----
From: owner-ietf-openproxy(_at_)mail(_dot_)imc(_dot_)org
[mailto:owner-ietf-openproxy(_at_)mail(_dot_)imc(_dot_)org]On Behalf Of Alex 
Rousskov
Sent: Thursday, May 15, 2003 5:31 PM
To: ietf-openproxy(_at_)imc(_dot_)org
Subject: OCP header encoding




Let's define OCP "headers" as everything transmitted using OCP except
for application message data and metadata. Application message data
and metadata is, essentially, OCP payload.

I can think of four basic ways to encode OCP headers. I will mention
all four below and then indicate my current choice. If you disagree or
have any related insights, please let me know.

      1. Binary encoding: All headers are encoded using
         well-defined binary structures. Often, binary
         headers have fixed length. They are easy/fast to
         "parse" but difficult to debug. Some binary
         protocols allow for zero-copy implementations
         on network-order machines with appropriate word
         size. Irrelevant message parts or extensions
         are usually easy to skip without much parsing.
         Extensions are usually difficult to support.

         Examples are IP, TCP, DNS, ICP/DHCP, WebMUX, and
         application protocols using XDR (External Data
         Representation) standard. There is no Single True
         Standard for binary headers; everybody reinvents
         the wheel.


      2. MIME: MIME headers usually consist of a "special"
         first line followed by name-value pairs formatted
         following one of the MIME-like standards. Canonical
         examples are easy to parse, but 100% compliant
         implementation are probably non-existent due to
         complexity and mess in MIME-related standards.
         Parsing performance is so-so. Debugging and
         tracing is easy. Extensions are easy to add but
         difficult to ignore without parsing them first.

         Examples are HTTP, SMTP, ICAP, BEEP, SIP. There is no
         Single True Standard for MIME headers; everybody
         reinvents the wheel (by altering basic MIME
         requirements and by inventing their own "special"
         first lines).

      3. Optimized MIME (for the lack of a better name):
         This approach is similar to MIME, but it optimizes
         encoding to be easily parsable by documenting a
         simple and rigid format. The performance is
         optimized by providing explicit length for
         variable-length structures. Known length makes
         skipping extensions fast. This is still a text-based
         approach so it is not as fast as binary encoding.
         Debugging and tracing is relatively easy, but
         typing a raw message by hand using telnet is difficult.

         I could not find any examples, though several protocols
         use the elements of the above approach, such as
         NetStrings and alike. Here is an illustration:

              123-command parameter parameter CRLF
              34-name1: value1 CRLF
              45-name2: value21 value22 CRLF
              ...
              CRLF

         Where 123, 34, and 256 are lengths of the corresponding
         lines . An implementation can ignore the line without
         parsing most of its content because the size is known
         in advance.

         This approach can be extended to encode the entire
         header so that the size of the entire header is
         known in advance. This approach can be scaled down
         by using known-sizes for certain string values
         only, and not for all headers. Etc.

      4. XML (not discussed here since we want to avoid it).

My current preference is #3. I would consider going binary instead,
but I think that will scare too many ICAP folks off. I think MIME must
not be used "as is" because it is virtually impossible to support
fully and efficiently.

However I am not quite sure how far we should go in #3 to help
parsers. If we remove most of the lengths, then ICAP and HTTP folks
would feel very comfortable. We can just use a strict grammar for line
formats instead. On the other hand, knowing header sizes in advance
and skipping unknown extensions is an attractive optimization. Some
even argue that it improves security because of fewer buffer overruns,
but I am not sure that's a valid statement.

Any comments? What would be your preference? We must keep it simple,
but should we try to make is almost identical to HTTP/ICAP or should
we optimize further?

Thanks,

Alex.