while MIME optimization may be worth to look at, I don't think that your
optimization helps a lot.
After parsing the header size you still need to search for the colon, extract
the header name and check it by case-insensitive string comparison against your
list of supported headers in order to know whether you can skip them or not.
If you then decide to skip, you have the size to forward to the next header.
But the effort to determine the skip-decision is already 95% of the work. The
lookup of the CRLF characters in normal MIME headers is only a minor task in
If optimizing MIME, the headers or at least standard headers should come with
an (optional) header ID which makes it fast to determine which header it is. If
that ID is then combined with the size, it could really help.
On the other hand, you will need to deal with some sort of header registration
service and solve the customized extension header problem.
And will this then still look like MIME or is it then already close to a binary
The application message meta data in the OCP payload will often be MIME
headers. Often this meta data is quite long in today's typical applications.
I expect the OCP metadata to be much less.
So, if the application message's meta data needs to be parsed as MIME, does it
make much sense to introduce optimized MIME for the OCP metadata?
Although I have some sympathy for optimized MIME and binary formats, I
currently prefer to stick with MIME headers for OCP metadata.
From: Alex Rousskov [mailto:rousskov(_at_)measurement-factory(_dot_)com]
Sent: Thursday, May 15, 2003 11:31 PM
Subject: OCP header encoding
Let's define OCP "headers" as everything transmitted using OCP except
for application message data and metadata. Application message data
and metadata is, essentially, OCP payload.
I can think of four basic ways to encode OCP headers. I will mention
all four below and then indicate my current choice. If you disagree or
have any related insights, please let me know.
1. Binary encoding: All headers are encoded using
well-defined binary structures. Often, binary
headers have fixed length. They are easy/fast to
"parse" but difficult to debug. Some binary
protocols allow for zero-copy implementations
on network-order machines with appropriate word
size. Irrelevant message parts or extensions
are usually easy to skip without much parsing.
Extensions are usually difficult to support.
Examples are IP, TCP, DNS, ICP/DHCP, WebMUX, and
application protocols using XDR (External Data
Representation) standard. There is no Single True
Standard for binary headers; everybody reinvents
2. MIME: MIME headers usually consist of a "special"
first line followed by name-value pairs formatted
following one of the MIME-like standards. Canonical
examples are easy to parse, but 100% compliant
implementation are probably non-existent due to
complexity and mess in MIME-related standards.
Parsing performance is so-so. Debugging and
tracing is easy. Extensions are easy to add but
difficult to ignore without parsing them first.
Examples are HTTP, SMTP, ICAP, BEEP, SIP. There is no
Single True Standard for MIME headers; everybody
reinvents the wheel (by altering basic MIME
requirements and by inventing their own "special"
3. Optimized MIME (for the lack of a better name):
This approach is similar to MIME, but it optimizes
encoding to be easily parsable by documenting a
simple and rigid format. The performance is
optimized by providing explicit length for
variable-length structures. Known length makes
skipping extensions fast. This is still a text-based
approach so it is not as fast as binary encoding.
Debugging and tracing is relatively easy, but
typing a raw message by hand using telnet is difficult.
I could not find any examples, though several protocols
use the elements of the above approach, such as
NetStrings and alike. Here is an illustration:
123-command parameter parameter CRLF
34-name1: value1 CRLF
45-name2: value21 value22 CRLF
Where 123, 34, and 256 are lengths of the corresponding
lines . An implementation can ignore the line without
parsing most of its content because the size is known
This approach can be extended to encode the entire
header so that the size of the entire header is
known in advance. This approach can be scaled down
by using known-sizes for certain string values
only, and not for all headers. Etc.
4. XML (not discussed here since we want to avoid it).
My current preference is #3. I would consider going binary instead,
but I think that will scare too many ICAP folks off. I think MIME must
not be used "as is" because it is virtually impossible to support
fully and efficiently.
However I am not quite sure how far we should go in #3 to help
parsers. If we remove most of the lengths, then ICAP and HTTP folks
would feel very comfortable. We can just use a strict grammar for line
formats instead. On the other hand, knowing header sizes in advance
and skipping unknown extensions is an attractive optimization. Some
even argue that it improves security because of fewer buffer overruns,
but I am not sure that's a valid statement.
Any comments? What would be your preference? We must keep it simple,
but should we try to make is almost identical to HTTP/ICAP or should
we optimize further?