RE: OCP header encoding


On Fri, 16 May 2003, Martin Stecher wrote:

while MIME optimization may be worth to look at, I don't think that
your optimization helps a lot.

After parsing the header size you still need to search for the
colon, extract the header name and check it by case-insensitive
string comparison against your list of supported headers in order to
know whether you can skip them or not. If you then decide to skip,
you have the size to forward to the next header.


I disagree. Here is an optimized version:

        1. Parse the length (one scan until you find a non-digit).
        2. Check characters at two or three positions within the
           now-isolated header (while traversing your recompiled
           "known headers" tree that tells you which positions to
           look at).
        3. If any position does not match, you are done -- this
           is not a header you care about. If all positions match,
           you know the unique name of the header you care about,
           proceed.
        4. Check the candidate header name using strncmp (one scan).

I do not know whether the above optimization is common, but we use it
successfully in Web Polygraph for HTTP headers. If you take a list of
all headers you care about and build an optimized decision tree for
that list, you will see that 1, 2, or 3 character lookups is usually
all it takes to identify any known candidate, even for a long name
list. This is because header name strings are rather "long", while the
information they carry is very "short".

What makes it slower in HTTP is that you still have to parse any MIME
header you decided to skip.

Furthermore, we can define all headers to be case-sensitive (I do not
see why not) and come up with very short names. Actually, a few
MIME-based protocols (e.g., SIP) even allow for short "aliases"! For
example, "From" is equivalent to "f", "Call-ID" is equivalent to "i".

But the effort to determine the skip-decision is already 95% of the
work. The lookup of the CRLF characters in normal MIME headers is
only a minor task in header parsing.


The above optimization makes skip-decision very cheap. Also, if an
implementation just looks for CRLF to find the end of a MIME header
field, that implementation violates MIME. CRLF alone does not
terminate a field ( "CR LF not-space" does, but only in simple,
canonical cases).

If optimizing MIME, the headers or at least standard headers should
come with an (optional) header ID which makes it fast to determine
which header it is. If that ID is then combined with the size, it
could really help. On the other hand, you will need to deal with
some sort of header registration service and solve the customized
extension header problem.


See above for a solution that does not require a registration service
or IDs while providing almost equivalent benefits.

And will this then still look like MIME or is it then already close
to a binary format?


The only visual difference is that every "line" starts with a number.
It is still a true text-based format.

The application message meta data in the OCP payload will often be
MIME headers. Often this meta data is quite long in today's typical
applications.


True.

I expect the OCP metadata to be much less.


I agree, at least for the bulk of OCP messages.

So, if the application message's meta data needs to be parsed as
MIME, does it make much sense to introduce optimized MIME for the
OCP metadata?


It may make sense because:
        - not all OCP agents would care or even know about MIME
          metadata, while all OCP agents have to care about OCP headers
        - OCP headers are used with all OCP messages while MIME
          metadata is used with just a few (those meta-have
          messages that pass metadata to the other side).
        - using MIME "as is" will produce many OCP implementations
          that will not pass compliance tests

Finally, it is probably possible to make the new format parsable by
old MIME code with no or minimum changes (depending on the old code
interface). Working on it...

Although I have some sympathy for optimized MIME and binary formats,
I currently prefer to stick with MIME headers for OCP metadata.


Noted. Perhaps some of the above arguments may tilt your preference in
the other direction. If not, perhaps the simplicity of the format will
(when it is published). If not, we can still go back to the bad old
MIME once optimization details are known and considered.

Thank you for a prompt feedback!

Alex.