RFC 6690 says:
In
order to convert an HTTP Link Header field to this link format, first
the "Link:" HTTP header is removed, any linear whitespace (LWS) is
removed, the header value is converted to UTF-8, and any percent-
encodings are decoded.
Well, that's broken.
OK, let me start typing that errata report then.
coap://example.com?stupid%3Dkey=4711
is not distinguishable from
coap://example.com?stupid=key=4711
(The typical reaction of an implementer is “then don’t do that!” [1,2].)
That isn't a "limitation”.
For RFC6690 users, it pretty much is, because certain URIs don’t work.
They tend to design their URIs in such a way that they do, probably more so
because these designs are natural for them than because they are fully aware of
that limitation.
It's a bug to decode pct-encoded octets in
a URI before decomposing the reference into its parts.
Well, percent-encoding is playing two roles in RFC 3986: hiding characters
within syntactic elements from their delimiter roles, and encoding non-ASCII
(and C0 etc.) characters.
The passage I cited from RFC 6690 got nicely rid of the latter, and broke the
former(*).
ASCII is already
in UTF-8. Decoding a pct-encoding doesn't make it "more UTF-8"; it just
means the string is no longer a URI reference. That's broken. So utterly
broken that it obviously wasn't reviewed by the right people.
So what should I write into the errata report?
Or more generally speaking, how should we fix RFC 6690, without creating a need
for constrained nodes to do full URI processing?
Maybe it is sufficient to document the limitation in the errata, for now?
And, more to the point of the subject line, how should we handle this on the
JSON/CBOR level?
There definitely will be a round-tripping problem with RFC 6690 if the URIs
collide with the above limitation of RFC 6690. But that’s OK because that
defines the subset.
To be more general, not doing any percent-decoding of URIs when creating
JSON/CBOR from scratch is probably the easy way, but it means that when we want
to phase out RFC 6690 on the constrained level by replacing it with JSON/CBOR,
there is additional complexity. Horribile dictu, but maybe IRIs are the right
thing to do here.
Grüße, Carsten
(*) It may be worth pointing out that the amount of breakage here is much
larger than for CoAP itself, which does the percent-decoding only after
decomposing a URI into what CoAP considers to be its components, so the URI
parsing works properly — coap://example.com/foo%2fbar has one path segment,
“foo/bar”.
But the application semantics of hiding application delimiters, which my
example above is breaking, is not supported in CoAP either.
Some people think that URIs should be carried around in that decomposed form
throughout the constrained space, and I can’t blame them.
I don’t have data how many URI libraries in active use in the non-constrained
space get this particular detail right, either.