Re: Last Call: draft-ietf-behave-nat-behavior-discovery (NATBehavior Di

Responding to Cullen's comments on draft-ietf-behave-nat-behavior-discovery

Many of the questions you raise point to the same question of whether
tests or techniques that are known to fail on a certain percentage of
NATs under a certain percentage of operating conditions are
nevertheless valuable.  behavior-discovery has an applicability
statement 
http://tools.ietf.org/html/draft-ietf-behave-nat-behavior-discovery-06#section-1
that discusses those issues in some detail.  I spent enough time
wording that statement and discussing it with various people that I
think it is best to refer to that statement.

You also repeatedly uses phrases such as "basically won't work" and
"it might work."   The comes down to the value of "certain percentage"
as used above.  My experience with these techniques, and the
experience of those who have used such techniques recently, is that
they are far more reliable than that, into the 90% range, particularly
when used correctly.  That is not high enough that we could go back to
3489---all techniques require fallbacks because they fail, and 90% is
far, far too low of a success rate---but it is high enough that
applications can make useful decisions based on that information,
provided they have a fallback in cases where the information is wrong.
 And those are the conditions of the experiment.


On Tue, Mar 31, 2009 at 11:08 PM, Dan Wing <dwing(_at_)cisco(_dot_)com> wrote:

Forwarded for those that don't follow the main IETF list.
-----Original Message-----
From: ietf-bounces(_at_)ietf(_dot_)org 
[mailto:ietf-bounces(_at_)ietf(_dot_)org] On Behalf Of Cullen
Jennings
Sent: Tuesday, March 31, 2009 9:53 AM
To: IETF Discussion; IESG IESG
Subject: Re: [BEHAVE] Last Call: draft-ietf-behave-nat-behavior-discovery
(NATBehavior Discovery Using STUN) to Experimental RFC

I was somewhat shocked to see the draft in IETF Last Call. The last
time this draft was discussed at the microphone in Behave, many people
were very concerned that it id not possible to correctly characterize
a NAT


This is not true.  behavior-discovery was briefly presented at IETF71
without any comments and at IETF70 with only minor comments.   The
last time it was discussed at length at the mic was at IETF69, which
was where it was decided to change it from standards track to
experimental.  However, let me address two specific points here
regarding your characterization of that discussion:

"many people were very concerned":  What concerns people had were
about its previous standards-track status.  Subsequent feedback on the
list and at IETF70 have indicated these concerns are resolved.

"that it is not possible to correctly characterize a NAT": First, let
me emphasize that the key distinction between 3489 and
behavior-discovery is that behavior-discovery is very clear that it is
not possible to characterize a NAT, that only snapshots of behavior
for particular source-dest tuple at an instant in time are possible.

without using more than one address behind the NAT.  The tests
done on on NATs by the researches at MIT did that, so did the the
stuff from Cornell, as did draft-jennings-behave-test-results.


Multiple addresses are definitely required to characterize the NAT (to
the extent it's ever possible), but as behavior-discovery is very
clear that it is not trying to replicate that aspect of the 3489
behavior, is not precisely relevant.

The
reason why this was needed is largely the reason why the IETF invented
ICE. Initially folks thought that STUN alone would be enough to do NAT
traversal. This turned out not to be true, STUN deprecated those parts
and ICE was started. This draft fails to describe the types of test
that have actually been found to work and just reinstates the stuff
that was deployed and failed and then deprecated out of STUN.


This draft makes no claim that it is duplicating or attempting to
mimic the original intention of 3489 or the capabilities of ICE.  It
carefully describes when the tests it includes can be used and
presents examples of how an application might make use of it for
situations that ICE does not address.  The only use by applications
proposed in the draft (as an experiment) is for an application that
uses it for initial mode selection but is capable of adapting to its
actual experience on the network.

Now this draft pays some lip service to the fact that it basically
won't work. You can read section 1 and get the full idea.


This term "basically won't work" is a gross oversimplification.  It's
also not a technical analysis, which makes it difficult to respond to
in a technical way.

More generally, one of the important differences between 3489 and ICE
is that ICE ensures there is always a fallback to TURN, and thus
avoids the problem experienced by 3489-based applications that tried
to determine in advance whether they would need a relay and what their
peer reflexive address will be, which are both impossible.
behavior-discovery requires an application using it to have a
fallback, but unlike ICE's focus on the problems inherent in VoIP
sessions, doesn't assume that it will only be used to establish a
connection between a single pair of machines, and so alternative
fallback mechanisms may make sense.  i.e. in a P2P application, it may
be possible to simply switch out of the role where such connections
need to be established, or to select an alternative indirect route if
the peer discovers that in practice, 10% of its connection attempts
fail.

The first
and 2'nd par basically say this won't work. Then para 3 proposes this
is experiment to find out something we already know the answer to.


The experiment described is so totally different than 3489's claim
that a NAT can be characterized, labeled, and all future application
decisions rely on that behavior that it's hard to respond to this.

When this work was chartered, it was about making a way to
characterize NATs and describe them in a controlled lab like
environment.


Here is how the work was chartered in the May 2007 update to the BEHAVE charter:

Sep 2007                Submit standards-track document that describes how an
application can determine the type of NAT it is behind

So it was not at all chartered for lab analysis, it was chartered for
use by an application.

It was not about resurrecting exactly the part of STUN
that had been tried, failed , and deprecated.


As already stated, it deliberately tries to outline when these
techniques are applicable and when they aren't.


Specific problems with the draft....


For other readers' benefit, the section numbers you use in this
section refer to revision -04 of the draft.  The current revision is
-06.


2.2 - this just won't work. The test described in this draft will not
find out if the node is behind an endpoint independent nat. I have
specific nats where it won't work. I have explained to the authors why
it won't work. The answer I get back is "it might work some of the
time". It true it might work some of the time but we all agree there
are many NATs for which it will not work.


(I'm not sure what section this text was referring to)

Again, we're not searching for an existence proof of NATs where it doesn't work.

More importantly, don't put words into other people's mouths,
especially when the statement is not true.  You have never received an
"it might work some of the time" response.   The response has always
been of the form that it works most of the time on most NATs.

If you have useful information on the population of NATs that fail
with these techniques a significant amount of the time, please share
that information.  I've asked, but have not received any information.


Other section that don't work are 3.1, 3.2, 3.3, 3.4, 3.5, 3.5 - uh
all of them actually. I'm glad to provide details on why they don't
work but I have in the past and we not really debating if they work or
not. The authors believe there is sufficient text at the beginning of
the draft in section 1 that it is OK that these fail in many cases and
don't need to be mentioned again. We not debating these work some of
the but not all the time - everyone agrees on that.

Section 4.1 - The results in here will be just wrong for ports
different than the one the test was run on. The response to this was
to add "use same port when possible". That's not going to exactly
cause applications to work.


First, this is something of a corner case in any event.  Secondly, a
large number of applications do use the same port for all of their
communication.  So, yes, they are perfectly capable of allocating one
port (or a small number of ports), testing, and using that (those)
port(s) for their communication.  And having a draft that points out
the advantages of doing so is, by itself, useful.


Section 4.2 - Can't really separate the topic from if UDP is blocked
from if the STUN server is down.


The draft recommends multiple STUN servers for redundancy, but do we
really want to engage in a reduction to the absurd of "it's impossible
to diagnose network behavior because you can never differentiate
between host failure vs network failure in the absence of responses"?
True.  But not interesting.


Section 4.4 - this fails if the port was recently used for similar
tests from same stun server. There no way to know this as an
application. This type of test can work in lab condition where all
traffic on NAT is controlled but it operational networks it will fail.


I believe this question is adequately addressed (and limitations
discussed) in Section 4.1 (of the current draft).  That section was
posted to the mailing list prior to IETF73, and given to you directly,
but I am not aware of receiving any comments from you reflecting its
presence.


It is possible to do timing testing using just the change ip flag. The
REPSONSE-TARGET stuff is not needed and open up the possibility to
have a STUN server send packets to places that it should not which
causes IDS system to black list all traffic from the STUN server thus
making it unusable for other clients. The ability to tell the STUN
server to send packets to arbitrary locations would be fine for a
system in a lab used to characterize a NAT but is not a good idea for
internet deployed STUN servers.


Please read the draft for the authentication and state required when
using XOR-RESPONSE-TARGET.  Your comments do not apply to the current
(or recent) revisions of the draft.  This issue has been extensively
discussed on the mailing list and in wg sessions, and resolved.  This
question was also addressed in my response to your previous comments
in August (see below).


The bulk of these issues were sent Aug 28 to behave list during the
2nd WGLC. I requested agenda time during IETF 74 to discuss these
issues but it was denied.


I'm including at the bottom of this message a copy of the issues
raised Aug 28 with my responses to them.  Those issues were addressed
in the -05 revision in November.  There has been no subsequent list
discussion of those topics.


In summary -The approaches described in this draft are known to fail
with many NATs. I don't see any evidence of the WG actually having
read this draft much less have consensus on the approach in it.

I think the number of people providing comments both at the mic at the
various sessions and on the mailing list argues against this
statement.  In reviewing these comments, I came across this statement
reviewing whether the applicability statement addressed the concerns
about the draft after it was moved to experimental:

----------------------------------------------------------
To: Behave WG <behave(_at_)ietf(_dot_)org>
From: Cullen Jennings <fluffy(_at_)cisco(_dot_)com>
Date: Thu, 29 Nov 2007 22:13:09 -0800
Subject: [BEHAVE] behave-nat-behavior-discovery

I like the way you scope what this can and can not be used for. It
removed a lot of my concerns about it.

Cullen <with my individual hat on>
----------------------------------------------------------

which makes me wonder what has changed since then?

Bruce

I
think the WG should spend meeting time to discuss the topic and decide
what to do. The key topic in my mind is we are defining a document
that allows us to characterize a NAT in a lab or if we are trying to
make something that works in field and can be used to aid NAT
traversal in applications.

Cullen <in my roll as individual contributor and ex chair of behave>





On Mar 10, 2009, at 8:44 AM, The IESG wrote:

The IESG has received a request from the Behavior Engineering for
Hindrance Avoidance WG (behave) to consider the following document:

- 'NAT Behavior Discovery Using STUN '
  <draft-ietf-behave-nat-behavior-discovery-06.txt> as an Experimental
RFC

The IESG plans to make a decision in the next few weeks, and solicits
final comments on this action.  Please send substantive comments to
the
ietf(_at_)ietf(_dot_)org mailing lists by 2009-03-31. Exceptionally,
comments may be sent to iesg(_at_)ietf(_dot_)org instead. In either case, 
please
retain the beginning of the Subject line to allow automated sorting.

The file can be obtained via

http://www.ietf.org/internet-drafts/draft-ietf-behave-nat-behavior-discovery-0
6.txt



IESG discussion can be tracked via

https://datatracker.ietf.org/public/pidtracker.cgi?command=view_id&dTag=15728&;
rfc_flag=0


The following IPR Declarations may be related to this I-D:

https://datatracker.ietf.org/ipr/919/
https://datatracker.ietf.org/ipr/945/


_______________________________________________
Behave mailing list
Behave(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/behave


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf

_______________________________________________
Behave mailing list
Behave(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/behave




Below is the response to Cullen's email of Aug 28, which includes
those questions inline.  The -05 version addressed these concerns,
although frequently in different ways then described below because it
also reflected updates in response to Magnus' AD review.

On Tue, Sep 2, 2008 at 4:35 PM, Bruce Lowekamp 
<lowekamp(_at_)sipeerior(_dot_)com> wrote:

Sorry, meant to respond to this over the weekend.

I'm sure these won't be the only issues raised that need clarification.

Bruce


Cullen Jennings wrote:


Few comments

Test 1: The first test defined in section 4.1 You have to have a good
way to distinguish not UDP connectivity from the case where the STUN
server is down or someone put in the wrong address.


That text should probably be clarified to remind the reader that the
test applies only to connectivity to the particular STUN server.  (and
that either the client or server could be misconfigured)  In general,
though, that qualifier is at the beginning of the document and applies
to everything in it.

Test 2: In test 4.2, I think it is important to identity that this test
has to be done for every single port the application wants to use
because we know that the results for different ports are often not the same


Will add a note that in some situations behavior may vary port-by-port.
 Actually, this should probably also be highlighted earlier in the
document.

Test 3: This would be better if it mandated using a random source port
and highlight that if any device had recently done test 2 on the same
port, this test will fail to get the correct result and it fails in a
way that suggests things will work that don't. It may sound odd to think
one might get the same port but often when an embedded system reboots,
it might run the same tests again at the same IP address and with ports
like this.


that's a good idea.  Will clarify that in 3.2 and 4.3.  Will also add
some text to point out the interaction between this and the previous issue.

Section 4.4 - given the rate limiting of NATs, I would give some advice
that was more implementable than "care must be given". I'd specifically
rate limit to something like no more than X stun packets per second. It
would be nice to discuss here how long these tests can take even when
they are done in parallel.


Do you have a suggestion for the value of X?  I don't think 4787
explicitly addresses this.

Section 4.5 - the XOR-RESPONSE-TARGET just sort comes out of nowhere and
is a bit hard to understand when reading the draft from front to back


It was used earlier in 3.3, but I agree it's not well defined there,
either.  Will try to make the introduction clearer.

The whole XOR-RESPONSE-TARGET has all the same security problems and
issues as TURN. Instead of reinventing it all here, why not just use
TURN to be able to send the packets to where you want them?


I disagree that the same security issues are present (or at least in the
same magnitude).  In particular, XOR-RESPONSE-TARGET is even more
limited in applicability in order to prevent it from being used for any
significant type of attack.  In addition to the precautions already in
the current text, a previous revision required authentication for all
uses of XOR-RESPONSE-TARGET, but many people objected to this being too
strong compared to the potential threat this method offered, and instead
group consensus (almost unanimous as I recall) was for allowing the
current CACHE-TIMEOUT state/rate-limiting approach while allowing those
who desire to still require authentication.  While you're right that
there is still a risk of a state attack on the server, the state
required to store is very small, is stored only on transactions that
request it, and the CACHE-TIMEOUT attribute provides feedback to the
client whether the request can complete.  Furthermore, the consequences
of being unable to server new requests due to a DoS attack on the server
are not nearly as dire for behavior discovery as for TURN.

Regarding TURN as a solution, that seems incredibly heavyweight for this
application, although I don't see a reason not to say that this test
could be implemented that way.  You'd have to be careful, however, to
make sure that neither end of the TURN connection is running any
keepalives, which might be difficult since TURN specifies both STUN
keepalives and TURN keepalives for its connections.

In section 6.1 where you have "the server must verity that it has
previously..." I think this must needs to be a MUST

yes

I will note the RESPONSE-TARGET design forces the server to remember for
some time some state about every binding request.


 From Section 5:

 If a client intends to utilize an XOR-RESPONSE-TARGET attribute in
   future transactions, as described in Section 4.5, then it MUST
   include a CACHE-TIMEOUT attribute in the Request with the value set
   greater than the longest time duration it intends to test.

so it only needs to store state for binding requests that included the
CACHE-TIMEOUT attribute.

Section 5.1 - the SRV service name needs to be in the IANA registry


true


Cullen <as an individual contributor>

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf

Re: Last Call: draft-ietf-behave-nat-behavior-discovery (NATBehavior Discovery Using STUN) to Experimental RFC