Re: [core] tsv-dir review of draft-ietf-core-coap-14

Michael,

thank you for this thoughtful and extensive review.

We have turned nine of the items below into eight tickets, #287 to
#294 (see in-line references below), that will be processed along with
the other IETF last-call tickets and turned into
draft-ietf-core-coap-15 in the next few days.

Below, please find a detailed discussion of the review comments.

Grüße, Carsten


        Review - good news:

        I have reviewed few selected aspects in draft-ietf-core-coap-09
        (http://www.ietf.org/mail-archive/web/core/current/msg03280.html). I
        confirm that those past concerns are sufficiently addressed by this
        document.

Thank you, that is good to know.

        Review - not so good news:

        * General: In a nutshell, this document proposes a rather lightwight
         protocol that provides a subset of TCP/HTTP transport.

The authors don't agree with this view at all.
CoAP is a application layer transfer protocol.
It provides very few transport features.
It is not very productive to compare it to TCP.
It can be used for certain applications that could theoretically also
be supported by HTTP over TCP, if the complexity of these protocols
didn't get in the way.
We sense very little danger that Web developers will run out and
abandon HTTP and TCP in droves when CoAP is published.

         In order to
         reduce message sizes and implementation complexity, the protocol
         sacrifies many TCP/HTTP features. But I really had a hard time
         figuring out what the protocol as defined in this document does
         *not* provide, in particular in the message layer. At first sight,
         the CoAP protocol as defined in this document lacks features such as:

         - Support for messages exceeding the path MTU

         - Byte stream transport with segmentation and reassembly

         - Flow control

         - Congestion control for non-confirmable messages (this must IMHO be
           fixed)

         Further typical TCP features are pretty much left to the
         implementation or to extensions which will make the protocol more
         complex (imho, as complex as TCP):

         - In-order delivery of unconfirmed messages (for confirmed messages,
           delivery seems to be in-order right now if the implementation
           indeed complies to the mandated limit of one out-standing
           transaction per destination, but any application requiring a data
           transfer of more than 1KB will need something better)

         - Strong protection against message duplication (in particular if
           some checks are disabled based on cross-layer assumptions, which
           is allowed by this spec)

         - Non-trivial transport features for multicast

         - Security and DoS protection (mostly out-of-scope of this review)

         These aspects are further detailed below with specific text
         references.

         => I think that the document needs a disclaimer in Section 1 that
         explicitly explains what users cannot expect from the CoAP base
         protocol (say, compared to a light-weight HTTP/TCP implementation
         with HTTP compression and the absolute minimum set of TCP features).

Disagree.  That is not the job of a protocol specification.

RFC 2616 does not have a disclaimer that explains how it is inferior
to FTP, even though both can be used for related applications and FTP
clearly has a number of desirable features that HTTP lacks (some of
which were later defined in additional protocols on top of HTTP, e.g.,
WebDAV, and some that apparently weren't actually needed by HTTP users).

Creating such a section would already fail at the definition of the
protocol stack against which CoAP is to be compared to, as a
"light-weight HTTP/TCP implementation with HTTP compression" is
neither standardized nor even possible within the confines of the
constrained nodes addressed by CoAP.

Once written, such a section would almost immediately be dated as
implementers find new ways to solve old problems, and as additional
specifications are written (cf. WebDAV above).

        * General: The message layer, which basically provides a transport
         protocol service, is in parts only vaguely specified, and many
         transport-related protocol features can be overwriten by
         implementation or environment-specific settings, or by future
         extensions drafts. This makes it very difficult to review the
         protocol regarding completeness and robustness, such as atypical
         packet arrival patterns, reordering, and other corner cases that
         fundamentally matter for a transport protocol design. I believe that
         the protocol is simple enough that a full description of the state
         engine and event processing would be possible (like RFC 793 Section
         3.9. Event Processing). But without a rigourous specification, it is
         difficult to figure out what a CoAP implementation would do in many
         corner cases, and if interoperable implementations would interprete
         the spec in the same way.

As is apparent from interops, this was less of a problem for
implementers so far.  We have drafts that discuss implementation
approaches in LWIG, and these will be developed further.  I'm not a
big fan of an FDT requirement for an initial protocol specification,
this was part of what killed OSI.

          => The following list of open issues is almost certainly
          incomplete; other TSV experts might identify further problems.

        * Section 4.2

             A CoAP endpoint that sent a Confirmable message MAY give up in
             attempting to obtain an ACK even before the MAX_RETRANSMIT
             counter value is reached: E.g., the application has canceled the
             request as it no longer needs a response, or there is some other
             indication that the CON message did arrive.  In particular, a
             CoAP request message may have elicited a separate response, in
             which case it is clear to the requester that only the ACK was
             lost and a retransmission of the request would serve no purpose.
             However, a responder MUST NOT in turn rely on this cross-layer
             behavior from a requester, i.e. it SHOULD retain the state to
             create the ACK for the request, if needed, even if a Confirmable
             response was already acknowledged by the requester.

         => I think that this situation can also occur during an attack with
         spoofed addresses, i. e., it is not "clear" that the ACK was
         lost. In that case, retransmitting the request may even be the
         better alternative, in order to identify the attack. As already
         mentioned, state diagrams and a clear event handling would help to
         identify such corner cases (there may be more than this specific
         one). This would also simplify discussion when it is indeed save to
         release state information.

I don't understand the concern.  This is a comment on a common
implementation optimization.  There are multiple ways to counter an
attack, the preferable one of which is to use DTLS security.

        * Section 4.3

             At the CoAP level, there is no way for the sender to detect if a
             Non- confirmable message was received or not.  A sender MAY
             choose to transmit multiple copies of a Non-confirmable message
             within MAX_TRANSMIT_SPAN, or the network may duplicate the
             message in transit.

          => This section lacks any guidance on how frequently
          non-confirmable messages may be sent. Section 4.7 mandates a
          maximum PROBING_RATE for congestion control. With the default
          parameters, MAX_TRANSMIT_SPAN is 45s, and PROBING_RATE is 1
          Byte/second, i. e., for messages larger than 45 Byte, the limit for
          multiple copies is given by MESSAGE_SIZE/PROBING_RATE, not by
          MAX_TRANSMIT_SPAN.

Indeed, a reference to 4.7 wouldn't hurt here. [#288] (Sending multiple
copies that are spaced more than MAX_TRANSMIT_SPAN may defeat the
duplicate detection.  Messages that benefit from saturation are often
short discovery requests, so extremely very low value we chose for
PROBING_RATE may not hurt in practice.)

        `* Section 4.5

          o A constrained server MAY even want to relax this requirement for
             certain non-idempotent requests if the application semantics
             make this trade-off favorable.  For example, if the result of a
             POST request is just the creation of some short-lived state at
             the server, it may be less expensive to incur this effort
             multiple times for a request than keeping track of whether a
             previous transmission of the same request already was processed.

         => I think that this section must stronger state that both endpoints
         must agree on those modified semantics. Otherwise, it is not clear
         to me whether the client and server implementations would indeed be
         interoperable, in particular, if they are implemented independently
         and thus make different assumptions. The client here asked for
         reliable transfer, but the server actually ignores that requests for
         reliabile transfer, right?

It is the purview of the server how it implements its resources.  The
server can unilaterally decide that its resources don't need this
processing.  There is no interoperability concern with this.  The
keyword is "if the application semantics make this trade-off
favorable".  This kind of latitude may be unusual for a protocol
specification, but it is often what makes a constrained implementation
possible.

        * Section 4.6

              Message sizes are also of considerable importance to
              implementations on constrained nodes.  Many implementations
              will need to allocate a buffer for incoming messages.  If an
              implementation is too constrained to allow for allocating the
              above-mentioned upper bound, it could apply the following
              implementation strategy: Implementations receiving a datagram
              into a buffer that is too small are usually able to determine
              if the trailing portion of a datagram was discarded and to
              retrieve the initial portion.  So, if not all of the payload,
              at least the CoAP header and options are likely to fit within
              the buffer.  A server can thus fully interpret a request and
              return a 4.13 (Request Entity Too Large) response code if the
              payload was truncated.  A client sending an idempotent request
              and receiving a response larger than would fit in the buffer
              can repeat the request with a suitable value for the Block
              Option [I-D.ietf-core-block].

         => This document must include a discussion on flow control, i. e.,
         what happens if the receiver's receive buffer is full or if an
         application stalls and does not consume data for longer time
         (exceeding the retransmission timeout).

         Explanation:

         For constrainted devices with small receive buffers and
         communication with more than one endpoint, it seems to me pretty
         likely that at some points in time no receive buffer is
         available. The protocol spec does not discuss what happens if the
         buffer is to small to process even the header, and what the behavior
         of the receiver should be (silently dropping the incoming message?

Yes.

         sending a RST? does the behavior depend on whether it is CON or
         NON?). I think that this spec must provide guidance how the protocol
         deals with buffer shortage.

That is an implementation concern.

         TCP's solution to this kind of situations is flow control by the
         receive window.

Actually, TCP's solution is dropping the SYN.

         In CoAP, there seems to be an implicit assumption
         that messages can either always be "somehow" processed by a receiver
         or savely be dropped. As long as the protocol allows only one
         outstanding transaction per destination, and allocates dedicated
         receive buffer for a full CoAP packet for each destination,
         out-of-my head this indeed seems to work without deadlocks because
         we basically have the alternating-bit-protocol. But in more complex
         situations with small buffer sizes (e. g., multiple
         transactions/applications sharing one buffer, or insequence-delivery
         for more than one transaction), I think that the protocol could run
         into deadlocks because it cannot prevent a sender from sending or
         retransmitting data into a receiver not having any receive buffer.

         I am not an expert on formal protocol verification, i. e., I cannot
         provide an exact specification for the minimum set of implementation
         requirements that savely prevents deadlock (also see my other
         remarks on state engine specification). But I am really concerned
         that the document does not even mention the terms "flow control",
         "buffer sizing", etc.

There is no way to define the CoAP protocol such that it prevents
implementers from painting themselves into a corner.

Again, I would expect much of this discussion to turn up in future
LWIG documents.  Work is being done one some of these issues in
draft-ietf-lwig-guidance, draft-kovatsch-lwig-class1-coap, and the
(currently expired) draft-castellani-lwig-coap-separate-responses.  In
Orlando, there was considerable energy between the authors of these
documents to get more documentation done; a first authors' draft is
already in the LWIG SVN and is actively being worked on.

        * Section 4.7

             In order not to cause congestion, Clients (including proxies)
             MUST strictly limit the number of simultaneous outstanding
             interactions that they maintain to a given server (including
             proxies) to NSTART.  An outstanding interaction is either a CON
             for which an ACK has not yet been received but is still expected
             (message layer) or a request for which neither a response nor an
             Acknowledgment message has yet been received but is still
             expected (which may both occur at the same time, counting as one
             outstanding interaction).  The default value of NSTART for this
             specification is 1.

         => This section MUST clarify congestion control for non-confirmable
         messages. I miss a clear recommendation how frequently a sender is
         allowed to send non-confirmable messages if there is no other
         feedback. I think that a maximum data rate of PROBING_RATE would be
         reasonable and save, but I recall some discussion on other proposals
         (e. g., mandating a confirmable message every X non-confirmable
         messages, etc.).

That specific discussion would be part of the -observe extension.  The
base protocol is stuck at PROBING_RATE.

        * Section 4.8.2

                  o PROCESSING_DELAY is the time a node takes to turn around
                     a Confirmable message into an acknowledgement.  We
                     assume the node will attempt to send an ACK before
                     having the sender time out, so as a conservative
                     assumption we set it equal to ACK_TIMEOUT.

         => I assume that the spec wants to say "a receiver MUST have sent an
         ACK after PROCESSING_DELAY"? I have not found that requirement
         elsewhere in the document. If it is not a MUST requirement, the
         calculations involving PROCESSING_DELAY seem to be not the worst
         case and are therefore not really useful for worst-case analysis.

The worst-case analysis is about as robust as that for TCP 2MSL.
There is absolutely no assurance in IP that data won't be in the
network longer than MSL (and I sure have measured latencies way
higher).  The analysis describes what we try to guard against, not the
absolute worst case.

But I agree that the discussion of when to piggy-back (5.2.1) could
include a mention of PROCESSING_DELAY. [#290]

        * Section 5.3.1

             A token is intended for use as a client-local identifier for
             differentiating between concurrent requests (see Section 5.3);
             it could have been called a "request ID".

         => Im my understanding, concurrent requests are not allowed by this
         spec, i. e., why does this document not recommend to use an empty
         token as long as NSTART=1? It apparently just wastes scace bandwidth
         if there is only one allowed request to a destination. As an
         editorial note, this reference to Section 5.3 is strange here; this
         is the only paragraph in the document where concurrent requests are
         mentioned.

Concurrent requests are allowed as long as there has been a return
message.  E.g., a client could send a CON GET, get an empty ACK
(resetting the outstanding count to zero), and then decide to do
another CON GET (and receive another empty ACK) before the response
CON for the first request arrives.  Which request does the response
reply to?

        * Section 5.3.2

             The exact rules for matching a response to a request are as
             follows:

             1.  The source endpoint of the response MUST be the same as the
                 destination endpoint of the original request.

             2.  In a piggy-backed response, both the Message ID of the
                 Confirmable request and the Acknowledgement, and the token
                 of the response and original request MUST match.  In a
                 separate response, just the token of the response and
                 original request MUST match.

             In case a message carrying a response is unexpected (the client
             is not waiting for a response from the identified endpoint, at
             the endpoint addressed, and/or with the given token), the
             response is rejected (Section 4.2, Section 4.3).

         => To me, the CoAP message processing seems underspecified. What
         really happens if either the msg and token mismatch (two entirely
         different cases), i. e., what will the endpoint put into the RST
         message? Section 4.2 states "The Acknowledgement message MUST echo
         the Message ID of the Confirmable message, and MUST carry a response
         or be empty (see Section 5.2.1 and Section 5.2.2)."; based on text I
         cannot figure out what the response would be. For interoperability
         between implementations, this sort of events matter.

There is some discussion of rejecting messages in 4.2.
There is relatively little need for details here as there is no
connection state to maintain or fix up once a message has been rejected.

         => Would it be allowed to send back a response both by a CON and a
         NON message, with the same token, but different message IDs? If so,
         how would the matching deal with this?

Yes.  This is one of the things that -observe might be doing.
(Note that a message can have more than one response.)

        * Section 5.3.2

          Implementation Note: A client that receives a response in a CON
             message may want to clean up the message state right after
             sending the ACK.  If that ACK is lost and the server retransmits
             the CON, the client may no longer have any state to correlate
             this response to, making the retransmission an unexpected
             message; the client may send a Reset message so it does not
             receive any more retransmissions.  This behavior is normal and
             not an indication of an error.  (Clients that are not
             aggressively optimized in their state memory usage will still
             have message state that will identify the second CON as a
             retransmission.  Clients that actually expect more messages from
             the server [I-D.ietf-core-observe] will have to keep state in
             any case.)

         => I am confused by this sort of argument of removing state. This
         statement probably refers to Token state, since some kind of Message
         ID state has to be kept at least for MAX_LATENCY according to
         Section 4.8.2? Again, I'd expect the protocol specification to
         clearly state what the minimum requirements on keeping state are.

We try to minimize those minimum requirements.
The specific optimization described here has the unfortunate property
that a client may seem like it just has been rebooted.
We cannot disallow rebooting in the protocol.

        * Section 5.4 and Section 5.10

         => The maximum size of these options, in particular if more than one
         is used at the same time, can easily exceed the IPv6 MTU of 1280
         bytes. In other words, a single non-fragmented IP packet will not
         only have not enough space for payload if options are used, possibly
         a single packet will not even be sufficient to transport all
         required options? What does the CoAP base protocol do in that case?
         Discard that request/response and return an application error? Why
         does Section 5 not have any guidance on size/segmentation issues if
         options are (too) large?

There is a SHOULD about messages sizes in 4.6.

        * Section 8.1

          A multicast request is characterized by being transported in a CoAP
          message that is addressed to an IP multicast address instead of a
          CoAP endpoint.  Such multicast requests MUST be Non-confirmable.

         => A normative statement on congestion control for *sending* to
         multicast addresses is missing. I think that a slow-speed network
         can get very easily congested by multicast messages, i. e., this
         matters for the main CoAP use cases. I believe that sending 1
         Byte/second is save for multicast destinations.

As the text you cite says, multicast requests are governed by NON rules.

It is indeed easy to congest a constrained node network using
multicast (if that is implemented at all, usually by flooding a
broadcast message).  Multicast application protocol specifications for
more powerful networks, such as RFC 6762 (which does not contain the
word "congestion" and only alludes to "rate-limiting"), have not even
tried to codify the limits of reasonableness here: it is nearly
impossible to give limits that aren't both ridiculously high for some
environments and too conservative for others.  Multicast congestion
control for wireless constrained node networks is significantly less
well understood because of the dynamics of the flooding protocols
used; it also has to contend with memory limits in the router nodes.

In summary, CoAP implementers are likely to be extremely careful about
using multicast.  And that is about the level of advice we could give
them: "Be very, very careful about using multicast".

        * Section 8.1

          When a server is aware that a request arrived via multicast, it
          MUST NOT return a RST in reply to NON.  If it is not aware, it MAY
          return a RST in reply to NON as usual.  Because such a Reset
          message will look identical to an RST for a unicast message from
          the sender, the sender MUST avoid using a Message ID that is also
          still active from this endpoint with any unicast endpoint that
          might receive the multicast message.

         => Why is a RST forbidden by a MUST? I would understand the
         motivation for a SHOULD, but if a server is overloaded by multicast
         requests and runs out of processing resources for multicast
         requests, isn't there a need to tell the sender that it has to stop
         using this multicast group?

There is no flow control implication of RST, so I'm not quite sure I
understand the scenario.  Sending back an RST to a multicast is not
going to help an overloaded server (there are no retransmissions to be
curbed).
(The specific case discussed in the rest of that paragraph is that of
a server stuck with pre-3542 posix, so it is unaware that it just
received a multicast.  This is an ugly situation, and there is not
really that much that can be done to clean it up.)

        * Section 8.2

          When matching a response to a multicast request, only the token
          MUST match; the source endpoint of the response does not need to
          (and will not) be the same as the destination endpoint of the
          original request.

         => So, the token is the only way to deal with packets that are
         duplicated in the network? Then, this section must IMHO expand
         further on how to select token IDs for multicast transfer. For use
         in multicast, Section 5.3.1. "The client SHOULD generate tokens in
         such a way that tokens currently in use for a given
         source/destination endpoint pair are unique." is not sufficient; the
         token must in addition be unique during MAX_LATENCY, right?

The text is about response matching, not about removing duplicates.
Duplicate removal will be done based on message-IDs.
That state (if duplicate removal is desired) indeed needs to live on.

        * Section 8.2

            If a server does decide to respond to a multicast request, it
            should not respond immediately.

         => The spec leaves open if a server is allowed to respond with
         confirmed message. If a large number of servers respond, the ACK
         traffic for many CONs could be an issue, right? But if only NON is
         allowed, what happens if the server wants that its message is indeed
         delivered reliably to the requester?

5.2.3 says a NON SHOULD be answered by a NON.
Now: "if the server wants that its message is indeed delivered
reliably" -- why? there was no guarantee the requesting NON would have
reached it in the first place, so why is delivering the response now
so important?
But if it indeed is, that might be a good reason to send a CON.  The
ACK traffic will indeed confound the network load.  So, as with any
SHOULD, there should be a strong reason to depart from 5.2.3.

        * Section 8.2

          E.g., for a multicast request with link-local scope on an 2.4 GHz
          IEEE 802.15.4 (6LoWPAN) network, G could be (relatively
          conservatively) set to 100, S to 100 bytes, and the target rate to
          a conservative 8 kbit/s = 1 kB/s.  The resulting lower bound for
          the Leisure is 10 seconds.

         => While I like the idea of randomizing the response time to avoid
         in-cast problems, according to Section 4.8, a conservative
         assumption about the allowed data rate in a potentially congested
         network is PROBING_RATE = 1 Byte/second. 1 kB/s might be realistic
         in a specific application scenario if the network does not have any
         other traffic, but the attribute "conservative" should not be used
         here, because reality with cross-traffic could be entirely
         different.

Indeed.   We should strike "conservative".  [#291]

        * Section 8.2

          If a CoAP endpoint does not have suitable data to compute a value
          for Leisure, it MAY resort to DEFAULT_LEISURE.

         => With this vague specification of leisure time the client has no
         means to know whether *any* response will ever arrive. The servers
         could, for instance, err on the size of the group and just pick all
         a large random leasure time. I think it would make sense to define
         an upper limit on the leasure time, to allow some interpretation on
         the client side. If this upper limit significantly exceeds the rate
         PROBING_RATE, servers may just randomly decide not to reply, instead
         of waiting for a long time.

That is indeed a problem, but it strikes me as more of of a quality of
implementation problem.  (I don't understand the last sentence,
though.  How does a time limit exceed a rate?)

        * Section 8.2.2

          When a forward-proxy receives a request with a Proxy-Uri or URI
          constructed from Proxy-Scheme that indicates a multicast address,
          the proxy obtains a set of responses as described above and sends
          all responses (both cached-still-fresh and new) back to the
          original client.

          => I don't understand from the document how this works. For
          instance, will these responses all have the same token?

Yes.

          How can a
          client process this if it expects only one response from the proxy?

Indeed, there is a problem if the authority contains indirection
(e.g., is a DNS name), and maps to a multicast address unbeknownst of
the client.

          My general impression is that the multicast mode of CoAP would
          require a more rigorous specification for being included in a PS
          document.

There is continuing discussion of this in draft-ietf-core-groupcomm.
The fundamental mechanism works well for the problem we are trying to
solve using multicast (local service discovery), but, as the text
says, e.g., multicasting through a proxy may need additional
mechanisms.

        * Section 9

          DTLS is not applicable to group keying (multicast communication);
          however, it may be a component in a future group key management
          protocol.

         => I am not really familiar with DTLS. But communication to
         multicast addresses by CoAP cannot be secured by DTLS, right? If so,
         why is there not a big warning sign "DTLS is not available for
         multicast CoAP"?

This sentence is the warning sign.

In today's usage of CoAP, multicast is mainly used for discovery, and
that often takes place as a prerequisite to establishing security.
So there is no practical problem today.  But we want to be able to
secure multicast CoAP, so we are working on multicast extensions to
DTLS.  These will take some time.

        * Section 11

         => This section IMHO lacks the description of two further attacks:

            (a) The equivalent of a SYN flooding attack on TCP would be
            sending complex queries with CON to a server. Given that the cost
            of a CON request is small, this attack can easily be
            executed. Also, if the server responds with CONs, it will have to
            allocate buffer and retransmission logic for each request, and it
            will likely run out of resources. A simple remedy is rate
            limiting as mentioned in Section 4.7; this counter-measure should
            be repeated here.

Yes. [#292]
(There is a whole set of battery depletion attacks that are hard to
guard against without some form of network segregation.)

            (b) A subtle attack with spoofed addresses could possibly exploit
            the lack of congestion control in CoAP. Due to NSTART=1, a tricky
            attacker could prevent a server to communicate with a legitime
            client, because only one transaction is allowed to one
            destination address. The attacker could try to always occupy this
            "slot".

Good point. [#293]

            Both attacks are due to the lack of a three-way handshake like in
            TCP.

Yes.  This is a design feature, and we are fully aware of its cost.
(The real answer, of course, is global deployment of BCP38.  But I
won't open that can of worms.)

        * Section 11

         => This section IMHO needs a discussion on minimum requirements on
         how to select Message ID and Tokens. Both are a means to protect
         against "hijacking" of transactions / falsification of responses,
         but if an attacker can guess these values, an attacker can inject
         wrong data into a CoAP communication. Compare e.g. to a TCP receiver
         that carefully checks whether sequence numbers are valid, i.e.,
         within the receive window.

The Token length was chosen to be as big as it is to enable this kind
of protection, if desired.  More likely, we will simply use DTLS
security where that matters.  But a short discussion of this could be
added. [#294] Note that there is a recommendation that message-IDs are
allocated sequentially in draft-ietf-lwig-guidance; for a constrained
system, this may be the only way to keep enough message state for
reliable duplicate removal.

        Editorial nits:

        * Section 2.2

                    CoAP makes use of GET, PUT, POST and DELETE methods in a
                    similar manner to HTTP, with the semantics specified in
                    Section 5.8.  (Note that the detailed semantics of CoAP
                    methods are "almost, but not entirely unlike" those of
                    HTTP methods:

         => s/unlike/like/ ?

Ah, sorry, that is an inside joke for Douglas Adams fans.
Reference added (SVN [1272]).

        * Section 3

             Following the header, token, and options, if any, comes the
             optional payload.  If present and of non-zero length, it is
             prefixed by a fixed, one-byte Payload Marker (0xFF) which
             indicates the end of options and the start of the payload.  The
             payload data extends from after the marker to the end of the UDP
             datagram, i.e., the Payload Length is calculated from the
             datagram size.  The absence of the Payload Marker denotes a
             zero-length payload.  The presence of a marker followed by a
             zero-length payload MUST be processed as a message format error.

          => I think that the term "payload marker" is kind of dangerous; it
          would be better to use a term like "end-of-option option". When I
          first read this section, I wondered whether a CoAP implementation
          could just scan through the packet to find the begin of the payload
          by the first occence of 0xFF after the default CoAP
          header. However, this would require 0xFF to be masked in all
          options. Masking is realized in Section 3.1, but apparently not in
          Sections 3.2 and Section 5.4.

The payload marker is not by itself an option.  Also, it is only
present if there is payload.  Instead of changing the name, maybe we
should add a note that byte-wise scanning for 0xFF is not a viable
technique for finding the payload.  [#287]

        * Section 4.4

             The same Message ID MUST NOT be re-used (in communicating with
             the same endpoint) within the EXCHANGE_LIFETIME (Section 4.8.2).

             Implementation Note: Several implementation strategies can be
                employed for generating Message IDs.  In the simplest case a
                CoAP endpoint generates Message IDs by keeping a single
                Message ID variable, which is changed each time a new
                Confirmable or Non- confirmable message is sent regardless of
                the destination address or port.  Endpoints dealing with
                large numbers of transactions could keep multiple Message ID
                variables, for example per prefix or destination address.
                The initial variable value should be randomized.

         => Using a single Message ID variable IMHO is only possible if there
         is only a single message outstanding to any address, because the
         Message ID has to be kept for verifying responses. Which implies
         that even in the "simplest case" there is also one Message ID
         variable per address. I wonder whether the Implementation Note
         should be something of the sort "implementations will typically
         store Message IDs per destination, but they may use a single counter
         to ensure uniqueness among several destinations".

The state per outstanding request is separate from this.  Simple
implementations will have one counter and one (or a very small number
of) outstanding request, so they will indeed not keep per-destinations
state.

        * Section 4.6

                  header and options are likely to fit within the buffer.  A
                  server can thus fully interpret a request and return a 4.13
                  (Request Entity Too Large) response code if the payload was
                  truncated.  A

          => The syntax "4.13" is not introduced at this stage; it could make
          sense to add a brief sentence early in the document to explain the
          response code format

Or just a forward reference to 5.9.2.9. [#289]


        * Section 4.8

                Message transmission is controlled by the following
                parameters:

         => At least DEFAULT_LEISURE is not defined in the text until this
         table (and it is not really self-explaining).

This is one of the places where I would expect curious readers to make
use of the search function of their display device...

        * Section 4.8.2

         => The whole section on time values derived from transmission
         parameters is pretty hard to parse. Instead of organizing it
         according parameters, it would be better to highlight the subset of
         parameters that actually matter for an implementation, and what is
         exactly the event at the beginning and end of that duration.

I'd rather write an LWIG document with lots of more details about
these parameters than trying to shoe-horn all this information in here.
The specification is already heavy on implementation information.

        * Section 5.3.1

          The Token is used to match a response with a request.  The token
          value is a sequence of 0 to 8 bytes.

         => While CoAP optimizes its protocol fields for single bits, the
         document does not comment at all on reasonable sizes for the
         token. At least some text mentioning the high overhead of a 4 or 8
         byte token compared to the rest of the CoAP headers could be useful.
         Possibly also addressing the security-size tradeoff.

Yes, see [_] above.


        * Section 5.10

         => I don't understand why the Proxy-URI is longer than others, and
         why the length is 1034.

The reason this is much larger than the other options is that it might
be used for interfacing to HTTP, which tends to use much larger URIs than we
normally create in the CoAP world (say, for OAuth).
(The specific value of 1034 is a remnant of an earlier option
encoding, 1023 would do, too, but we never made that change.)