Gen-ART review ofdraft-ietf-mboned-mtrace-v2-08

I am the assigned Gen-ART reviewer for this draft. For background on 
Gen-ART, please see the FAQ at 
<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Please resolve these comments along with any other Last Call comments 
you may receive.

Document: draft-ietf-mboned-mtrace-v2-08
Reviewer: David L. Black
Review Date: 21 July 2011
IETF LC End Date: 19 July 2011

Summary: This draft is on the right track but has open issues, described in the 
review.

This draft describes version 2 of multicast traceroute.

Major Issues:

This review is late because it took me much longer than expected to work 
through this
draft in order to figure out how the protocol works. While I believe that the 
protocol
does work if correctly implemented, I don't believe that the protocol is 
effectively
implementable from this draft, and hence significant work is required.

The open issues fall into two primary areas:

- Message Structure: Sections 4 and 5 are confusing.
- Message Processing: The most important functionality of this protocol is the 
router
        processing of messages in Section 9.  I found numerous problems there, 
including
        significant missing pieces of the specification (e.g., the material on 
when and
        how to generate Reply messages is incomplete).

In addition there are some missing security considerations around traffic 
amplification.

-- Message Structure --

The text in sections 4 and 5 is confusing.  The problem starts with 
the following sentence in section 4.2:

   An mtrace2 message MUST contain exactly one Mtrace2 Query header.

Because the mtrace2 Query header is not discussed until section 5, this
sentence is easy to misread as a requirement for exactly one Mtrace2
Query TLV.  

It gets worse, because it turns out that the Mtrace2 Query header is actually
the value[V] for the Query, Request and Reply TLVs.  Hence, calling that a Query
header is an invitation to confusion (e.g., where are the Request and Reply
headers?).

I suggest renaming the Mtrace2 Query header to Mtrace2 Trace header and then
changing the above sentence to:

   An mtrace2 message MUST begin with a Query, Request or Reply TLV. The value
   for each of these TLVs is an Mtrace2 Trace Header (see section 5).  These
   three TLVs MUST NOT be used in any position other than the beginning of
   a message.

That is not easy to figure out from the current text.

5.  Mtrace2 Query Header

What's the maximum size of the UDP packet?  This is important because there's
text in Section 9 on that depends on whether another TLV fits into the message
or not. There needs to be a minimum size, as nothing will work if there isn't
room for at least one standard response block after the header.

How does one figure out whether the addresses in the Mtrace2 Query Header
are IPv4 or IPv6?

I suspect that at least one of the multicast address and source address needs
to be valid (i.e., not all 1's for IPv4, not :: for IPv6).  That should be 
stated,
along with what happens when both addresses are invalid.

State that the client address must be valid (not all 1's for IPv4, not :: for 
IPv6).

Add the 24 bits of TLV to the packet structure diagrams in sections 5, 6, 7 and 
8.

Sections 6 and 7 - The IPv4 and IPv6 response blocks use the same TLV code (4),
but have different contents and lengths.  How does the receiver determine which
response block is present for TLV code 4? 

Section 8 Mtrace2 Augmented Response Block

   The augmented response block is always appended to mtrace2 TLV header
   (0x04).

No, definitely not.  Section 4.2 says that the mtrace2 TLV code for an augmented
response block is 5, not 4.  The following would be better:

  The augmented response block uses TLV code 5 and always comes after a standard
  response block TLV (code 4).

-- Message Processing --

Section 9 is confusing and incomplete.  This is the most important functionality
in the document, and it needs to be clear to the reader.  I'm having a very hard
time figuring out exactly what the router is supposed to do, and I found a 
number
of things that are clearly wrong.  It is not sufficient to just fix the things 
noted in this review - Section 9 needs to be clear and cogent about exactly what
the router does for each received message so that code can be written from it.

Section 9.1.1 Packet Verification

   If the router determines that it is not the proper last-hop router,
   or it cannot make that determination, it does one of two things
   depending if the Query was received via multicast or unicast.  If the
   Query was received via multicast, then it MUST be silently dropped.
   If it was received via unicast, a forwarding code of WRONG_LAST_HOP
   is noted and processing continues as in Section 9.2.

The last sentence has a number of problems:
- It needs to say that the TLV type in the first TLV is changed from 1 to 2,
- Section 9.2 requires that there be a response block, but the Query
        message doesn't have one.
- Section 9.2 will forward the Request towards the multicast source,
        even though the Query was not received by a proper last-hop router.
        That seems wrong.
It would be clearer to describe how to return the WRONG_LAST_HOP error to
the client here instead of referencing Section 9.2.

   Duplicate Query messages as identified by the tuple (Mtrace2 Client
   Address, Query ID) SHOULD be ignored.  This MAY be implemented using
   a simple 1-back cache (i.e. remembering the Mtrace2 Client Address
   and Query ID of the previous Query message that was processed, and
   ignoring future messages with the same Mtrace2 Client Address and
   Query ID).  Duplicate Request messages MUST NOT be ignored in this
   manner.

Is a 1-entry cache really sufficient?  What if 2 clients are trying to query
concurrently causing their duplicate messages to be interleaved?

9.1.2.  Normal Processing

   When a router receives an mtrace2 Query and it determines that it is
   the proper last-hop router, it changes the TLV type to 0x2 and
   treats it like an mtrace2 Request and performs the steps listed in
   Section 9.2.

That's problematic - Section 9.2 requires that there be a response block,
but there won't be one.  Unlike the error case above, this mismatch can
probably be addressed in Section 9.2.

9.2.2.  Normal Processing

Step 1 in this section is where the Request message needs to have a response
block.  The problems with the second reference from section 9.1 to section 9.2
might be correctable by explaining why the "If there was no room" condition
can't occur at the router that converts a Query to a Request.  This case needs
to be noted in the initial text in Section 9.2.

Also in step 1, how does the router determine whether there is "room in the
current buffer"?

9.2.3.  Mtrace2 Request Received by Non-Supported Router

Describe how the client matches the ICMP message with its original Query.

9.3.1.  Destination Address

   If the Previous-hop router for the mtrace2 Request is known for this
   request and the number of response blocks is less than the number
   requested (i.e., the "# hops" field in the mtrace2 Query header), the
   packet is sent to that router.  If the Incoming Interface is known
   but the Previous-hop router is not known, the packet is sent to an
   appropriate multicast address on the Incoming Interface.  The
   appropriate multicast address may depend on the routing protocol in
   use, MUST be a link-scoped group (i.e. 224/24 for IPv4, FF02::/16 for
   IPv6), MUST NOT be ALL-SYSTEMS.MCAST.NET (224.0.0.1) for IPv4 and All
   Nodes Address (FF02::1) for IPv6, and MAY be ALL-ROUTERS.MCAST.NET
   (224.0.0.2) for IPv4 or All Routers Address (FF02::2) for IPv6 if the
   routing protocol in use does not define a more appropriate group.
   Otherwise, it is sent to the Mtrace2 Client Address in the header.

There are multiple problems with that paragraph:

1) First line: "response blocks" is wrong.  The treatment of the augmented
response block's count of already returned response blocks is missing.

2) There are no instructions on what to do when the number of response
blocks is equal to the number of hops - the Request has to be converted to
a Reply.

3) The last line sends a Request to a Client.  If that's what was intended,
the Request first needs to be converted to a Reply.

9.3.2.  Source Address

   When the NO_SPACE error occurs, the router sends back the mtrace2
   Reply with contained data and the NO_SPACE error code as in
   Section 9.4, and continues the mtrace2 Query by sending an mtrace2
   Request containing the same mtrace2 Query header and its standard and
   augmented response blocks.  The corresponding augmented response
   block type is "# Mtrace2 Response Blocks Returned" described in
   Section 8.

There are a couple of problems with that paragraph:

1) What does "its" mean? If it refers to the header, that would
send all of the response blocks.  I think only this router's
response block is sent.

2) How does the hop count arithmetic work?  I suspect that the value for number
of response blocks returned is counted against hop count, but that's not
mentioned in section 9.3.1 .

14.  Security Considerations

There are additional security considerations.  I can see at least two ways
in which this protocol appears to offer opportunities for traffic amplification:

1) Suppose the client's address in the Query is a multicast address; the Reply
        will get multicast by the router to a multicast address that the client
        can't reach.  Use of a multicast address as a client address should 
probably
        be prohibited.

2) The proxying and NO_SPACE behaviors result in one Query returning multiple 
        Reply messages.  That's necessary for the protocol to work, but needs 
some
        control to prevent abuse.  Some sort of rate-limiting recommendations 
should
        suffice to prevent this from becoming a problem.

Minor Issues:

Section 1 Introduction

The last 3 paragraphs on p.6 are not clear about how hop count works. The 
statement
in the last paragraph on p.6 that a lower hop count can be used is not 
consistent
with the summary descriptions in the previous two paragraphs that don't discuss
hop count. That needs to be clearer. One possible approach would be:

- Add a sentence to the third to last paragraph (about generation of
        Request from Query) to say that the hop count in the Query is copied
        to the Request.
- Add a new paragraph following that paragraph to say that when a router
        that is not a first-hop router receives a Request message, the router
        adds a standard response block  If there are fewer standard response
        blocks that the hop count, the Request is forwarded.  If the number
        of response blocks matches the hop count or an error condition
        such as "no route" is encountered, a Reply is generated.

There is significant overlap between sections 1 and 3 - consider whether they 
should
be merged.  Of the two sections, the contents of Section 3 would be a better
Introduction than the current section 1.

Nits:

Add a definition of First-hop router to Section 2.

idnits 2.12.12 found a bunch of problems

  Checking nits according to http://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 7 instances of lines with non-RFC2606-compliant FQDNs in the
     document.

  == There are 1 instance of lines with multicast IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use the 233.252.0.x range defined in RFC 5771

The above 2 errors can be ignored - they FQDNs are MCAST.NET addresses that
this draft needs to use, and the multicast IPv4 address is the IPv4 address
for one of those FQDNs.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords -- however, there's a paragraph with
     a matching beginning. Boilerplate error?

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document date (January 7, 2011) is 195 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '2' is defined on line 1413, but no explicit reference
     was found in the text

  == Unused Reference: '5' is defined on line 1422, but no explicit reference
     was found in the text

  ** Obsolete normative reference: RFC 2373 (ref. '3') (Obsoleted by RFC 3513)

  ** Obsolete normative reference: RFC 2434 (ref. '4') (Obsoleted by RFC 5226)

  ** Downref: Normative reference to an Unknown state RFC: RFC 1071 (ref. '5')

  == Outdated reference: A later version (-11) exists of
     draft-ietf-mboned-auto-multicast-08

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf