Hi,
Sorry for being late with this IETF last call comments. I will partly blame the
ADs requesting this Transport Directorate review a bit late, the other part is
all mine and the holidays. Anyway, I do hope you will consider these issues and
comments as I believe I found some serious ones in addition to a number of
clarifications that should be made.
Significant Issues:
1. Congestion Control
This is clearly a tunnel establishment protocol of something that is IP
traffic. Thus normally the responsibility for congestion control is with the
tunnelled traffic. However, I would like argue that this does not apply in this
case due to the nature of the tunnelled traffic, i.e. multicast traffic and
secondary due to limitations in the tunnel protocol.
Lets start with the second part. This protocol claims to support ASM still
don't provide a upstream delivery mechanism, i.e. an ASM receiver is not
capable of sending as it should. This prevents several existing mechanism for
congestion control that exist in protocols supporting multicast. The first is
using RTCP for congestion control in ASM [RFC3550], the second is TCP-Friendly
Multicast Congestion Control [RFC4654] that can be used in the RMT suite of
protocols, and I know has been implemented in some NORM implementation. Thus
only strictly receiver based mechanisms, such as Wave and Equation Based Rate
Control [RFC3738] are available in this context.
Secondly, many multicast usages are in fact deployed without any congestion
control. This based on that the deploying entity controls the scope and
authorization for requesting multicast delivery. However, does restrictions
does not apply to AMT delivery of multicast. If the gateway can reach using
unicast the relay it can be delivered the multicast group from the domain the
relay is attached to. Thus, this protocol changes the deployment restrictions
of multicast which many non-congestion controlled delivery is based on. Instead
the non-congestion controlled traffic can now sent over an IP/UDP tunnel over
Internet where neither relay nor gateway may have any knowledge about the path
the traffic may take.
Based on this I would like to see two changes to this protocol specification.
First a section discussing the issue of congestion control. Secondly, I think
this protocol should have an applicability statement limiting its deployment to
restricted environments where the relay and gateway deployers can provide
certain resource provisions between the entities to avoid the multicast traffic
affecting other traffic sharing the same bottlenecks in ways not allowed by the
network provider.
2. Security
This protocols is frank with it having limited security features and says this
is similar to the IGMP and MLD protocols being used. However, I think this is a
failure to propoerly consider the threat model. If one uses AMT over general
Internet it will run in a network where the one deploying the multicast and the
relays no longer control requirements on source address verification or
possibilities for traffic separation as they can do within the domains where
multicast currently are deployed. The security vulnerabilities in IGMP and MLD
are much more contained and controllable in a LAN environment where one has
chosen to deploy multicast compared to an Relay exposing this to the whole
Internet. Once more I think there is only two choices here.
A) Beef up the security to general Internet threat model, i.e. at a minimal
provide a real model for gateway authentication using identities, not only
return routability based verifications.
B) Limit the applicability of AMT to managed environments and make it clear
that the relay will need to limit which gateways are allowed to access the
relay based on addressing.
Based on the first significant issue with congestion control I expect that
there is little meaning to do A) unless also one is willing to beef up AMT to
provide congestion control. Which I think is not according to the design wishes
for the protocol designers.
3. Use of Zero Checksum
The AMT specification enables the use of Zero UDP checksum with IPv6, i.e.
draft-ietf-6man-udpchecksums-06<http://datatracker.ietf.org/doc/draft-ietf-6man-udpchecksums/>
and
draft-ietf-6man-udpzero-08<http://datatracker.ietf.org/doc/draft-ietf-6man-udpzero/>.
Nothing against this in principal. However, I have noticed that AMT fails to
properly address the failure modes of using a zero-checksum. AMT is a typical
example of a protocol that actually need active verification of each tunnel
that zero-checksum functions. This as AMT is clearly intended to be deployed
with its Gateway part in end-hosts and residential network devices or routers.
This means the tunnel will pass through both firewalls and NATs on its path
between the relay and the gateway. Unless these devices are not upgraded to
support zero-checksum in UDP for IPv6 the traffic may actually become black
holed. The most likely is a simple firewall that has a rule for IPv6/UDP which
doesn't allow zero checksum as it is against RFC 2460. Thus all the Multicast
data packets will disappear on route in the tunnel. There is no mechanism in
the AMT protocol to detect this and negotiate with the relay so that it will
not use the zero-checksum for this tunnel.
This must be addressed as I see it. If not the AMT will be so brittle that it
can't be used in a large number of its intended deployments.
4) MTU issues
This document total fails to discuss the issues of MTU blackholing. As the IP
multicast datagrams as well as the encapsulated IGMP/MLD messages can with the
added tunnel-overhead result in that the sent packet exceeds the MTU of the
path, these packets could be black holed. This can potentially result in very
intermittent transport behavior for the tunnel. Thus, some discussion of how do
handle the MTU issues in this context should be introduced.
I am willing to discuss methods here, but I guess several alternatives exists
and thus which is most appropriate and the level of AMT support for them varies
I would like the protocol designers to do a first stab at resolving this.
Other Issues
======================================================================
A) The table of content on page 2 should include more levels of headings. Most
likely down to 4 is needed to make the TOC usable for finding content in the
document.
B) The claimed ASM support
I would like to better understand how one can claim to support ASM when one has
no up stream path to inject the ASM group participants traffic. When one uses
ASM one normally does this for a reason and needing the possibility to inject
packets into the group. These limitations needs to be clarified.
C) Section 4:
This section indicates is its figures, the ones in Section 4.1.1 and Section
4.1.3.1 that the Router Mode IGMP/MLD functions are outside of AMT. Which based
on the requirements in Section 5.3.3.4 is not accurate. That implementation
must be AMT specific to maintain the AMT tunnel to group membership handling
D) Not all figures have handles that can be used to reference.
E) Section 4.1.5.1:
Similarly, the selection of a unicast Relay address may be source-
dependent, as a relay contacted by a gateway to supply multicast
traffic must have native multicast connectivity to the traffic source
I find this statement confusing. There is no support in the protocol for
including the multicast group(s) which the gateway like to get in the discovery
phase of the protocol.
F) AMT Gateway in home router.
One deployment scenario is that the AMT gateway is deployed in the home network
router to provide access to multicast groups provided by the ISP. However, the
startup procedures in this deployment is unclear. The text appears to indicate
that one can both have a gateway implementation that as soon as the router
boots it starts doing discovery and requests to have Queries to send to its
internal local network. Other suggestions appears to be to wait until some host
actually request to join a group. The protocol specification appears to do its
best to leave very much flexibility and thus produce huge variance in the
market.
G) Figure 3:
I find no discussion of Membership Updates to rejoin the groups after the
tunnel has changed its source address as seen by the relay. This I think should
have some discussion. Yes, it reasonably clear that you will get traffic just
by sending new membership updates over the new tunnel. However, some discussion
of the timing between teardown and this membership update should be considered.
Figure 3 implies it should be sent after the teardown, which I think is correct
due to the traffic volumes to the NAT most likely causing the path change.
H) Figure in Section 4.2.2.2
Propose that the external side of the NAT should be marked as the one having
the "e" addresses.
I) Seeing the figure in Section 4.2.2.3 I definitely commented on the Address
Collision issues. It is made somewhat clearer later on this. But, maybe an
clearer section 4 sub-section to discuss this general issue that multiple left
side host can have the same address as other behind other tunnel-end-points and
thus there is need in the Relay to hide this from upstream and accept it and
use the tunnel context to track the different hosts.
J) Section 4.2.2.3:
To avoid placing an undue burden on the relay platform, the protocol
specifically allows zero-valued UDP checksums on the multicast data
messages. This is not an issue in UDP over IPv4 as the UDP checksum
field may be set to zero. However, this is a problem for UDP over
IPv6 as that protocol requires a valid, non-zero checksum in UDP
datagrams [RFC2460]. Messages sent over IPv6 with a UDP checksum of
zero may fail to reach the gateway. This is a well known issue for
UDP-based tunneling protocols that is described
[I-D.ietf-6man-udpzero]. A recommended solution is described in
[I-D.ietf-6man-udpchecksums].
I think this needs reformulating and I don't understand what is intended
with the last sentence.
K) Section 5.1.1.
"Destination UDP Port - The IANA-assigned AMT port number."
I find it strange that the protocol is mandating that all traffic is sent
to the IANA assigned port. Why can't the protocol not allow more flexible
handling
of the destination port? I find one single thing in the protocol which prevents
usage of an other relay listener port. That is that the Relay Advertisement
would
need a port field in addition to the address.
L) Section 5.1.1.4
A 32-bit random value generated by the gateway and echoed by the
relay in a Relay Advertisement message.
Should the above value make it clear that it preferably should be a
cryptographically random value as defined in RFC 4086?
M) There is lack of specification in Section 5.1.1 of what one does if version
is different from 0. This is mentioned in Section 5.3.3.1 but not for gateways
and not all messages types.
N) Section 5.1.4.8:
The Querier's Query Interval Code (QQIC) field in the general query
is used by a relay to specify the time offset a gateway should use to
schedule a new three-way handshake to refresh the group membership
state within the relay (current time + Query Interval).
In several places the QQIC and QRV are not made clear that this is defined in
the external references for MLD and IGMP.
O) Section 5:
When specifying the bit-fields, please indicate the length of each field in the
text. This is an accessibility question. If you have impaired vision
interpreting
the figures field length correctly can be different.
P) Section 5.2.2.4:
This section defined what retransmission parameters that one can potentially
configure. However, the section fails to define what the max or min values that
are acceptable are. Wrongly configured retransmission parameters can have
significant
negative impact on the network by causing bursts or unnecessary traffic.
Q) Section 5.2.3.3:
The gateway may continue to receive Multicast Data messages long
after the gateway sends a Membership Update message that deletes
existing group subscriptions.
What is "long" in the above sentence. Are we talking some known number of
seconds,
a TCP MSL, i.e. 2 min?
R) Section 5.2.3.4.3
A gateway MAY retransmit a Relay Discovery message if it does not
receive a matching Relay Advertisement message within some timeout
period. If the gateway retransmits the message multiple times, the
timeout period SHOULD be adjusted to provide an random exponential
back-off. The RECOMMENDED timeout is a random value in the range
[initial_timeout, MIN(initial_timeout * 2^retry_count,
maximum_timeout)], with a RECOMMENDED initial_timeout of 1 second and
a RECOMMENDED maximum_timeout of 120 seconds (which is the
recommended minimum NAT mapping timeout described in [RFC4787]).
I wonder if the above exponential backof is really what is desired. As it
randomly
picks the timeout between initial timout value and the
2^retry_count*initial_timeout it
will be both lower biased and also capable of producing timing intervals that
doesn't
grow. If one desire to have random timeout to avoid some clock synchronization
effects
I think an algorithm that is Td = MIN(initial_timeout * 2^retry_count,
maximum_timeout) and where the actual timeout is random*Td and where random
is a
random value from the uniform distribution in the interval [0.5,1.5]. Will both
ensure
that the timout between two retransmissions never is less and on average grows
with
a factor two.
This section is also not defining a minimal initial timout value, or any method
for
safely determine a more performant value from a safe initial value. To do a RTT
measurement using the AMT control messages would require some extensions but
could be a good way of deterimining a better initial value than 0.5 seconds
which
would be my recommendation for a default value.
S) Section 5.2.3.4.4
If a gateway executes the relay discovery procedure at the start of
each membership update cycle and the relay address returned in the
latest Relay Advertisement message differs from the address returned
in a previous Relay Advertisement message, then the gateway SHOULD
send a Teardown message (if supported) to the old relay address,
using information from the last Membership Query message received
from that relay, as described in Section 5.2.3.7. This behavior is
illustrated in the following diagram.
This text and the figure after it does not appear to be consistent.
The figure implies a timer that isn't present in the above. The textual
description appears sensitive to flapping anycast routing. I think
the figures indication of some higher timeout before redoing Relay discovery
appears much more robust.
T) Section 5.2.3.5.3
See R)
Also this text appears redundant to previous text. Maybe generalize this
into its own section being used in general for all messages needing
retransmission
U) Section 5.2.3.5.4
Querier's Query Interval Code carried by the general-query. A
gateway MAY use a smaller timer duration if required to refresh a NAT
mapping that would otherwise timeout.
Maybe the protocol would rather need a NAT keep-alive message to be sent
from the gateway to the relay. But maybe the Request, Query cycle is light
weight
enough that this works fine.
V) Section 5.2.3.6.1
o Insert IGMP or MLD datagrams into a queue for transmission after
it receives a Membership Query message.
What assumptions of queue depth exist in the above. Clearly the messages in this
queue should expire if they become to old.
X) Section 5.2.3.7
Gateway support for the Teardown message is OPTIONAL but RECOMMENDED.
The above is a very strange usage of RFC 2119 keywords. IF you use the synonyms
then maybe the error of writing it this way is clear.
Gateway MAY support for the Teardown message but SHOULD.
Y) The usage of retransmission versus repetitions are not always clear.
Some of the messages appears to simply need to be repeated QRV number of times
with
some interval. Others should really be matched with an answer and if not
received within
timeout retransmitted. Can these two cases be made more clear?
Z) Section 5.3.5
The hash function RECOMMENDED for use in computing the Response MAC
is the MD5 hash digest [RFC1321], though hash functions or keyed-hash
functions of greater cryptographic strength may be used.
I think this points to a security vulnerability. I think it needs to be made
clear
that the MAC MUST be keyed. If it is just a digest, then an attacker can
calculate the
MAC and perform an off-path attack.
This should be made clear also in Section 6.1 to be a requirement.
AA) Section A.1:
Altough this proposals has its advantages I think it might also illustrate a
short-coming. First of all 48-bits is quite short for a MAC. I would prefer a
variable length field.
Secondly, doesn't this actually create more material for an attacker to
determine
the key used by the relay?
That was all I have found.
Cheers
Magnus