ietf
[Top] [All Lists]

Re: Last Call: <draft-bormann-cbor-04.txt> (Concise Binary Object Representation (CBOR)) to Proposed Standard

2013-08-06 14:11:55
2) No support for tag compression.

(I assume this was about map keys, not about tags.)

That's an interesting requirement, and one that I think could be added to
the design if there were others that felt motivated to help.  I think I
can see a way that it could be added later: create a new tag that precedes
a map of string-to-int conversions.

I'd probably do it the other way around:

        tagN([{1: "foo", 2: "bar"}, ...abbreviated data item...])
Where an abbreviated data item of the form
        [1, 2, 3, {1: "beer", 2: "wine", "baz": 1}, 5, 6]
would then be interpreted as
        [1, 2, 3, {"foo": "beer", "bar": "wine", "baz": 1}, 5, 6]

Yes, processing of this kind is easy to add as a tag.
If the first parameter is instead a URI (preferably ni: scheme), it could save 
carrying around a large dictionary.

However, my intuition is that this wouldn't have radically better behavior
than gzip, and so I'd like to see some numbers to prove that the
complexity was worthwhile.

I share that intuition.  CBOR is intended to be useful also in those 
environments where running a full compression algorithm is impractical; here 
such a scheme could still have benefits.

The first one is my main complaint. I want to be able to use the binary
and text JSON encodings interchangeably and not have the upper layers to
have to bother with it at all.

(The applications I have in mind use media types, but:)

I think I understand this.  I could see where my CBOR event-based parser
could also take JSON in, and generate the exact same events.  I might even
do that as a proof of concept.  Could you say more about what in CBOR you
think violates this?

Well, if you don't have a media type, and don't know whether you'll get a JSON 
text or a CBOR data item, you may need to mechanically distinguish them.
E.g., the following six characters can occur at the start of a JSON text.
All are valid as start (or only) byte of a CBOR data item:

Byte    JSON meaning                CBOR interpretation

%x20  ; Space                       -1
%x09  ; Horizontal tab              9
%x0A  ; Line feed or New line       10
%x0D  ; Carriage return             13
%x5B  ; [ left square bracket       starts byte string
%x7B  ; { left curly bracket        starts UTF-8 string

(Well, for any valid JSON texts, heuristics might tell you the string data 
items a CBOR parser sees are unrealistically large.)

If a CBOR application does require initial signature bytes for self-description 
purposes, I would suggest using something like

        0xd8 0xf8 ...data item...

which decodes as tag248(data item); we could define 248 as a no-op tag.

(I'm still working on your other message -- lots of juicy input, thank you!)

Grüße, Carsten


<Prev in Thread] Current Thread [Next in Thread>