Re: (out of the blue) OCP header encoding issues

Thank you for detailed/thoughtful comments! I hope you will continue
to review our work from time to time. As you may know by now, we have
decided to take the first step towards a text-based protocol. The
first rough draft has been posted [1]. The current version of the BNF
(RFC 2234) is quoted below.


the first thing I notice is that there's no provision for structure
within a 'value'.  my guess is, sooner or later, you'll need it.
if rfc 822 had had a uniform way of representing lists of things in
message headers, (especially if those 'things' could themselves be 
lists), we probably would not have ended up with such a baroque
assortment of header field syntaxes today.

I don't think the Hollerith constants help much :)  if they're short,
you probably don't benefit much from the count; if they're potentially
long (say more than 1000 bytes), the whole use of record terminators
needs to be re-thought.

I do like the idea to have both named parameters and positional
parameters.  I've considered adding a similar feature to BLOB.

One of the major differences between our protocol and protocols like
HTTP is that we have many very small "control" messages in addition to
a few possibly large messages that carry payloads. HTTP has,
essentially, one shot: a request message has to contain all
information about client desires and a response message has to contain
everything about the server reaction. With SMTP, there are a few
control messages but they are pretty much limited to initial
negotiations. We have a bidirectional pipeline of control and "data"
messages.


that's fine, but it doesn't affect the presentation layer too much -
unless you need to multiplex chunk either control messages or data and
multiplex between them.

End-of-record delimeters are attractive in that you don't have to
know the length of a record in advance before you start writing it


True. However, as far as basic protocol elements are concerned, in my
experience, you always know the length in advance except for when
writing numbers.


it can be a pain, because you can no longer "just print" the text and
the delimiters, you now need to emit byte-counted text.

 If you do not know the length, something else is

broken in the design (e.g., protocol lacks chunking support for raw
data).

IMO, the primary practical feature (some would say advantage) of
delimiters is that they allow for human-friendly syntax. For example,

      GET / HTTP/1.0 CRLF

is much more friendly to a human than the equivalent

      3:GET1:/4:HTTP1:/3:1.02:CRLF

or something of that kind. Note that computer "preferences" are quite
the opposite -- the second example leaves fewer possibilities for
errors in a general context.


actually I disagree- the potential for programmer error is far higher
for the second example, and you'll have far more problems with
programming bugs than you have with transmissions getting corrupted. if
you're going to use a text format, best to keep it simple.

- but they do have some disadvantages: you don't know the length of
a record before you start reading it either,


This is usually not a problem for performance-sensitive protocols
because their implementations read using raw data buffers anyway.


depends on whether the data elements are smaller than the buffers.

Extensibility

Sometimes it's really useful if you can add additional protocol
elements to a record (say to extend a protocol) without resulting in
an incompatible record structure.  (822 headers are extensible in
that you can add new fields without changing the meaning of existing
fields; however, it's hard to add new data elements within a field.)


On a syntax level, our protocol has similar property: it is easy to
add new fields ('named-parameter' above), but not new elements within
a known field. The design assumption is that each field represents an
atomic "thing" that should not need more data elements. However, I am
sure there will be cases when what was perceived as a complete atom
becomes a collection of smaller particles that need more elements for
completeness.


that's my guess also.

If some of your protocol engines need to pass data from one peer to
another without examining it themselves, it's useful if the protocol
can treat that chunk of data as "opaque" - merely copying it from
one peer to the other without decoding and re-encoding it (and
potentially changing its representation).  Also, if an inner
protocol element is malformed, it's useful it this doesn't break
parsing of the outer protocol element.


I think our current NetString-like approach for data and metadata
passing works well here.


you don't have the problem with individual atoms so much as with
aggregates, and I don't see how your proposal supports those.
(for that matter, I don't know whether OPES needs them.)

Similarly, if you have protocol elements that are going to be
subjected to digital signatures and/or integrity checks, it's useful
if the application can treat those protocol elements as 'opaque' for
the purpose of signing/verification and not always have to deal with
them in decoded form.  (this has been difficult in 822, since
there's no clear distinction between things that are changable in
transit and things that are not)


Good point! Signing payloads should be OK. I think we do not have any
variability in the header syntax, except that a value can be quoted
even if it does not need to be.


can the order of fields be varied?  is there ever a need to group
several fields together and treat them as an aggregate?

822/MIME/HTTP headers are familiar, but they are also fairly
irregular.  I have written a lot of C code written to handle them-
routines to parse dates, address lists (with comments), content-type
fields, content-disposition fields, encoded-words, addresses, etc.
IMHO, their apparent simplicity is somewhat of an illusion.
Another problem with having 822 headers appear so simple is that
syntax errors are fairly common.


Our current syntax is very strict, but message headers "look like"
canonical MIME. It remains to be seen whether we stroke the right
balance.


be aware that having things "look like" MIME means that people will
treat them like MIME, and expect to be able to use MIME headers from
other protocols, wrap long lines like MIME does, add comments, 
use encoded-words, etc.  there's a camel attached to that nose.

I think we did what you are suggesting, except there is no support for
"nesting". Extensions are supported by adding more 'named-parameters'
to a message. Do you know of any text-based protocol that is not
XML-based but supports nesting?


seems like I've seen a couple, but don't have references offhand.

And if you want to consider a reasonably-complete non-text
alternative, you might take a look at BLOB:
http://www.cs.utk.edu/~moore/draft-moore-rescap-blob-02.txt


Thanks a lot for the pointer (the URL you really meant was [2])!


thanks for the correction.

BLOB
is certainly an interesting animal.  If nothing else, it looks simpler
and more straightforward than XDR approach, and may become a candidate
if we decide to switch to the binary path. Are there any
production-quality protocols built on top of BLOB?


not to my knowledge.  (then again, the same is true of what you're
proposing...).  and who knows, BLOB might just be too ugly or too
unfamiliar.  it's hard for me to tell, since I'm the one who designed
it.  mostly I offer it as an example of where you might end up if you
try to deal with the considerations I listed.

[2] http://www.cs.utk.edu/~moore/blob/draft-moore-rescap-blob-02.txt


Keith