On Aug 13, 2013, at 2:11 PM, Yaron Sheffer
<yaronf(_dot_)ietf(_at_)gmail(_dot_)com> wrote:
sorry I'm submitting these comments after the end of the LC period. I hope
they can still be of use.
No problem, and the are. Some answers below.
- The "diagnostic notation" can be used very effectively for things like
configuration files, e.g. if an application already uses CBOR on the wire.
Therefore I would suggest to formalize it a bit more, so that we also have
interoperability at this level.
Based on other things we heard, we went sort of in the other direction in the
next draft: saying that an implementer might consider a different notation for
config files, such as YAML.
- And since this notation is not meant as a JSON extension, this is a good
time to introduce comments (e.g. with an initial '#') into the notation.
Comments are not needed in diagnostics. :-)
- The positive vs. negative encoding means that the parser actually deals
with 9-, 17-, 33- and 65-bit integers. I don't think this makes it easier to
write parsers.
If you primarily think about signed integers, that may be one way of viewing
it. CBOR is primarily addressing unsigned integers. Negative integers are
clearly separated out, and indeed do require watching the sign bit. There is
no way to handle this that is painless for all applications.
Multiple people have implemented parsers in multiple languages (JavaScript,
Python, Ruby, ...), and we found it was pretty easy, and very little code.
- Arrays are prefixed by the number of elements but not by their length in
bytes. And elements need not be all of the same size. So you cannot skip the
array without fully parsing every last element. IIRC this is a major
disadvantage compared to ASN.1 encodings.
Correct. Counting items and not bytes is indeed a fundamental design decision
in CBOR that has advantages and disadvantages. We believe the advantages
outweigh the disadvantages.
In specific here, we are not assuming that "skip over an array/map" is a common
desire for a decoder. Forcing an encoder to marshall all the bytes before it
could start emitting an array/map causes overhead for the encoder that is
avoided if it just knows the number of elements it will emit.
- A puzzling change from JSON, and one that probably complicates
implementations quite a bit, is that a map's index can be of any type, not
just a string. And this includes mixed index types for the same map.
We have looked at this, and do not think that it complicates many
implementations. It does not complicate encoders at all. It does not complicate
decoders, even those that are looking for duplicate keys (which will be an
error). It only complicates decoders that are also sorting the keys. We assume
that the latter does not need to happen in the parser, but in the application
that needs the keys sorted by semantics it knows about.
- And similarly to arrays, you cannot skip a map element without deep parsing
of the element.
Ditto from above.
- I think many of the tag values are too specific, and are best left to
applications. For example, why should the format care if the app encodes a
UTF-8 string in base64?
Because a generic parser could then do the Base64 decoding. This will always be
an iffy space: CBOR will either force too much knowledge on the application, or
it will do too much. We picked somewhere on that continuum, knowing that it
will make no one happy. FWIW, Carsten and I disagreed with each other on this
to varying degrees during the different drafts, and different people proposed
different places on the continuum where CBOR "should" be.
OTOH, I would reserve a part of the tag space for "private"
application-specific allocations.
One of the always-repeating IETF rat-hole. We didn't go there, nor will we say
you're wrong for wanting to go there. The first-come-first-served space should
enable such allocations. If you just want to play around in the lab, the range
up to 18446744073709551615 should allow reliably avoiding collisions.
- One tag value you may want to consider adding is "critical" in the security
sense of the word, i.e., an application is required to fail if it does not
understand the value (probably best applied to map keys).
That is a lovely proposal for an extension from a developer who is using CBOR
in such an environment so that they can write the three or four paragraphs that
would be needed to explain that in detail.
- In the "diagnostic notation", I suggest to use symbolic values rather than
integers for tags, e.g. TAG_URI.
You seem to like the diagnostic notation more than we do. :-) If CBOR becomes
at all popular, having a second, more thought-out diagnostic notation seems
like a reasonable project.
- Sec. 3: because of the need for deep parsing mentioned above, a wire
protocol cannot be extended by adding an element that uses a new data type
(e.g. double precision FP) unless all potential recipients understand the
type, even though they might not need to use the data element.
Correct. We are adding more text to make that clear.
- Type restrictions for tags should be spelled out more clearly. E.g. in
2.4.4.2, please clarify that when this tag applies to an array or map, *all*
the items (and potentially items of nested arrays/maps?) MUST be byte strings.
A CBOR protocol can choose to do just that.
IMHO this just adds complexity and it's best to only tag the atomic item.
Noted. Some of the tags imply more restrictions that we would have hoped, but
if it isn't specified in the tag, the complexity just gets pushed to all the
applications.
- Text such as this (for unknown simple types): "might issue a warning, might
stop processing altogether, might handle the error by making the unknown
value available to the application as such, or take some other type of
action." is a security disaster waiting to happen. Also, it does not allow
extensibility. Even though the encoding format is nominally extensible, in
reality you cannot add stuff because the behavior of existing implementations
when faced with it is unpredictable.
You (and the other folks who made similar comments about the looseness in this
section) should be much happier with the next draft.
- Similarly for unknown tags (which IMHO should be ignored). Note that
"unknown" includes currently specified tags, because implementations are not
required to implement all current tags.
Not sure what you mean here.
- Another security issue, for incomplete arrays: "a parser may completely
fail the decoding, or substitute the missing data and data items using an
decoder-specific convention. " This is a buffer overflow vulnerability by a
different name.
How is that different than, say JSON, where the decoder might never see the
final "}" or "]"?
- And by the way the entire Sec. 3 is non-normative. I suggest to use
normative language for parser behavior, to ensure it is deterministic.
We think we have done so in the next version.
--Paul Hoffman