2) No support for tag compression.
(I assume this was about map keys, not about tags.)
That's an interesting requirement, and one that I think could be added to
the design if there were others that felt motivated to help. I think I
can see a way that it could be added later: create a new tag that precedes
a map of string-to-int conversions.
I'd probably do it the other way around:
tagN([{1: "foo", 2: "bar"}, ...abbreviated data item...])
Where an abbreviated data item of the form
[1, 2, 3, {1: "beer", 2: "wine", "baz": 1}, 5, 6]
would then be interpreted as
[1, 2, 3, {"foo": "beer", "bar": "wine", "baz": 1}, 5, 6]
Yes, processing of this kind is easy to add as a tag.
If the first parameter is instead a URI (preferably ni: scheme), it could save
carrying around a large dictionary.
However, my intuition is that this wouldn't have radically better behavior
than gzip, and so I'd like to see some numbers to prove that the
complexity was worthwhile.
I share that intuition. CBOR is intended to be useful also in those
environments where running a full compression algorithm is impractical; here
such a scheme could still have benefits.
The first one is my main complaint. I want to be able to use the binary
and text JSON encodings interchangeably and not have the upper layers to
have to bother with it at all.
(The applications I have in mind use media types, but:)
I think I understand this. I could see where my CBOR event-based parser
could also take JSON in, and generate the exact same events. I might even
do that as a proof of concept. Could you say more about what in CBOR you
think violates this?
Well, if you don't have a media type, and don't know whether you'll get a JSON
text or a CBOR data item, you may need to mechanically distinguish them.
E.g., the following six characters can occur at the start of a JSON text.
All are valid as start (or only) byte of a CBOR data item:
Byte JSON meaning CBOR interpretation
%x20 ; Space -1
%x09 ; Horizontal tab 9
%x0A ; Line feed or New line 10
%x0D ; Carriage return 13
%x5B ; [ left square bracket starts byte string
%x7B ; { left curly bracket starts UTF-8 string
(Well, for any valid JSON texts, heuristics might tell you the string data
items a CBOR parser sees are unrealistically large.)
If a CBOR application does require initial signature bytes for self-description
purposes, I would suggest using something like
0xd8 0xf8 ...data item...
which decodes as tag248(data item); we could define 248 as a no-op tag.
(I'm still working on your other message -- lots of juicy input, thank you!)
Grüße, Carsten