Re: Last Call: <draft-bormann-cbor-04.txt> (Concise Binary Object Repres

On Aug 13, 2013, at 2:11 PM, Yaron Sheffer 
<yaronf(_dot_)ietf(_at_)gmail(_dot_)com> wrote:

sorry I'm submitting these comments after the end of the LC period. I hope 
they can still be of use.


No problem, and the are. Some answers below.

- The "diagnostic notation" can be used very effectively for things like 
configuration files, e.g. if an application already uses CBOR on the wire. 
Therefore I would suggest to formalize it a bit more, so that we also have 
interoperability at this level.


Based on other things we heard, we went sort of in the other direction in the 
next draft: saying that an implementer might consider a different notation for 
config files, such as YAML.

- And since this notation is not meant as a JSON extension, this is a good 
time to introduce comments (e.g. with an initial '#') into the notation.


Comments are not needed in diagnostics. :-)

- The positive vs. negative encoding means that the parser actually deals 
with 9-, 17-, 33- and 65-bit integers. I don't think this makes it easier to 
write parsers.


If you primarily think about signed integers, that may be one way of viewing 
it.  CBOR is primarily addressing unsigned integers.  Negative integers are 
clearly separated out, and indeed do require watching the sign bit.  There is 
no way to handle this that is painless for all applications.

Multiple people have implemented parsers in multiple languages (JavaScript, 
Python, Ruby, ...), and we found it was pretty easy, and very little code.

- Arrays are prefixed by the number of elements but not by their length in 
bytes. And elements need not be all of the same size. So you cannot skip the 
array without fully parsing every last element. IIRC this is a major 
disadvantage compared to ASN.1 encodings.


Correct. Counting items and not bytes is indeed a fundamental design decision 
in CBOR that has advantages and disadvantages. We believe the advantages 
outweigh the disadvantages.

In specific here, we are not assuming that "skip over an array/map" is a common 
desire for a decoder. Forcing an encoder to marshall all the bytes before it 
could start emitting an array/map causes overhead for the encoder that is 
avoided if it just knows the number of elements it will emit.

- A puzzling change from JSON, and one that probably complicates 
implementations quite a bit, is that a map's index can be of any type, not 
just a string. And this includes mixed index types for the same map.


We have looked at this, and do not think that it complicates many 
implementations. It does not complicate encoders at all. It does not complicate 
decoders, even those that are looking for duplicate keys (which will be an 
error). It only complicates decoders that are also sorting the keys. We assume 
that the latter does not need to happen in the parser, but in the application 
that needs the keys sorted by semantics it knows about.

- And similarly to arrays, you cannot skip a map element without deep parsing 
of the element.


Ditto from above.

- I think many of the tag values are too specific, and are best left to 
applications. For example, why should the format care if the app encodes a 
UTF-8 string in base64?


Because a generic parser could then do the Base64 decoding. This will always be 
an iffy space: CBOR will either force too much knowledge on the application, or 
it will do too much. We picked somewhere on that continuum, knowing that it 
will make no one happy. FWIW, Carsten and I disagreed with each other on this 
to varying degrees during the different drafts, and different people proposed 
different places on the continuum where CBOR "should" be.

OTOH, I would reserve a part of the tag space for "private" 
application-specific allocations.


One of the always-repeating IETF rat-hole. We didn't go there, nor will we say 
you're wrong for wanting to go there. The first-come-first-served space should 
enable such allocations. If you just want to play around in the lab, the range 
up to 18446744073709551615 should allow reliably avoiding collisions.

- One tag value you may want to consider adding is "critical" in the security 
sense of the word, i.e., an application is required to fail if it does not 
understand the value (probably best applied to map keys).


That is a lovely proposal for an extension from a developer who is using CBOR 
in such an environment so that they can write the three or four paragraphs that 
would be needed to explain that in detail.

- In the "diagnostic notation", I suggest to use symbolic values rather than 
integers for tags, e.g. TAG_URI.


You seem to like the diagnostic notation more than we do. :-) If CBOR becomes 
at all popular, having a second, more thought-out diagnostic notation seems 
like a reasonable project.

- Sec. 3: because of the need for deep parsing mentioned above, a wire 
protocol cannot be extended by adding an element that uses a new data type 
(e.g. double precision FP) unless all potential recipients understand the 
type, even though they might not need to use the data element.


Correct. We are adding more text to make that clear.

- Type restrictions for tags should be spelled out more clearly. E.g. in 
2.4.4.2, please clarify that when this tag applies to an array or map, *all* 
the items (and potentially items of nested arrays/maps?) MUST be byte strings.


A CBOR protocol can choose to do just that.

IMHO this just adds complexity and it's best to only tag the atomic item.


Noted. Some of the tags imply more restrictions that we would have hoped, but 
if it isn't specified in the tag, the complexity just gets pushed to all the 
applications.

- Text such as this (for unknown simple types): "might issue a warning, might 
stop processing altogether, might handle the error by making the unknown 
value available to the application as such, or take some other type of 
action." is a security disaster waiting to happen. Also, it does not allow 
extensibility. Even though the encoding format is nominally extensible, in 
reality you cannot add stuff because the behavior of existing implementations 
when faced with it is unpredictable.


You (and the other folks who made similar comments about the looseness in this 
section) should be much happier with the next draft.

- Similarly for unknown tags (which IMHO should be ignored). Note that 
"unknown" includes currently specified tags, because implementations are not 
required to implement all current tags.


Not sure what you mean here.

- Another security issue, for incomplete arrays: "a parser may completely 
fail the decoding, or substitute the missing data and data items using an 
decoder-specific convention. " This is a buffer overflow vulnerability by a 
different name.


How is that different than, say JSON, where the decoder might never see the 
final "}" or "]"?

- And by the way the entire Sec. 3 is non-normative. I suggest to use 
normative language for parser behavior, to ensure it is deterministic.


We think we have done so in the next version.

--Paul Hoffman

Re: Last Call: <draft-bormann-cbor-04.txt> (Concise Binary Object Representation (CBOR)) to Proposed Standard