ietf
[Top] [All Lists]

Re: Last Call: draft-ietf-behave-nat-behavior-discovery (NATBehavior Discovery Using STUN) to Experimental RFC

2009-04-05 22:58:03
Bernard,

Thanks for the comments.  Let me see if I can describe a scenario in
which behavior-discovery is useful.

First, we don't want to "go back to 3489."  There were two problems
(well, there were a lot more problems, but I just want to talk about
two right now) in particular that we don't ever want to go back to:

- 3489 specified that an application would start up, characterize its
NAT, and work in that mode forever after
- 3489 specified that if you had a friendly NAT, you could query the
STUN server for your transport address and use that one address

At the same time, behavior-discovery is targeting applications for
which ICE doesn't necessarily make sense.  For example, applications
that don't want to fall back to TURN, but have other options for how
to establish a connection.   (whether this means indirect routing or
not needing the connection, or other reasons)

So let me try to go into more details on a potential P2P application.
When P2P node A starts up, it evaluates its NAT(s) relative to other
nodes already in the overlay.  Let's say that its testing indicates
it's behind a good NAT, with endpoint-independent mapping and
filtering.  In this case, the peer will join the overlay and establish
connections with appropriate peers in the overlay, but it will
advertise to any node in the overlay that wants to reach it that they
don't need to route through the overlay network formed by the P2P
nodes to reach it (which is the normal routing mode in a P2P overlay),
they can just send directly to its IP address.

So when node B wants to send a message to A, it sends the message
directly to A's IP address and starts a timer.  If it doesn't receive
a response within a certain amount of time, then it routes the message
to A across the overlay instead.  (Alternatively, B could
simultaneously send the message to A's IP address and across the
overlay, which guarantees minimum response latency, but can waste
bandwidth.)

A over time observes what percentage of the time it receives direct
messages compared to overlay messages. If the percentage of direct
connections is below some threshold (say 66%, picking a random number)
then may stop advertising for direct connections.  But if the
percentage is high enough, it continues to advertise because it may be
helping performance.  If at some point, the NAT changes its behavior,
A will notice a change in its direct connection percentage and may
re-evaluate its decision to advertise a public address.


(There are a lot of other details how this might work, how it would
deal with multiple levels of NATs, and what the actual cost benefits
are.  I don't want to get into all of the details of how it would work
here.)

This is a good example because behavior-discovery is used for initial
operating mode selection, but the actual decision for whether to
continue advertising that public IP/port pair is made based on actual
operating data.  It's also using the result of the behavior-discovery
work as an optimization, not in a manner where the application will
fail if a percentage of the nodes in the overlay are unable to make a
connection.

Bruce


On Sat, Apr 4, 2009 at 2:39 AM, Bernard Aboba 
<bernard_aboba(_at_)hotmail(_dot_)com> wrote:
Bruce Lowekamp said:

"Many of the questions you raise point to the same question of whether
tests or techniques that are known to fail on a certain percentage of
NATs under a certain percentage of operating conditions are
nevertheless valuable.  behavior-discovery has an applicability
statement
http://tools.ietf.org/html/draft-ietf-behave-nat-behavior-discovery-06#section-1
that discusses those issues in some detail.  I spent enough time
wording that statement and discussing it with various people that I
think it is best to refer to that statement.

You also repeatedly uses phrases such as "basically won't work" and
"it might work."   The comes down to the value of "certain percentage"
as used above.  My experience with these techniques, and the
experience of those who have used such techniques recently, is that
they are far more reliable than that, into the 90% range, particularly
when used correctly.  That is not high enough that we could go back to
3489---all techniques require fallbacks because they fail, and 90% is
far, far too low of a success rate---but it is high enough that
applications can make useful decisions based on that information,
provided they have a fallback in cases where the information is wrong.
And those are the conditions of the experiment."

What I am failing to understand is the distinction between those
situations in which we "cannot go back to RFC 3489" and the scenarios
envisaged for the experiment.

Presumably, situations in which we "cannot go back to RFC 3489"
include Internet telephony, which may be used for life-critical
situations such as E911.  For those kind of scenarios, we need
traversal technologies that are as reliable as possible, and are
willing to live with the complexity of ICE to achieve this.

The draft mentions P2P applications as one potential situation in
which usage of imperfect techniques is acceptable, and yet the
IETF currently has the P2PSIP WG, which is involved in the
development of technology for usage of SIP over P2P networks.
In that kind of application, wouldn't the reliability requirements
be similar to those in which we "cannot go back to RFC 3489"?

This lead me to think about the requirements for the diagnostic
scenarios that are also discussed in the document.  In existing
deployments it is often challenging to figure out the reasons
why traversal is unsuccessful, and what can be done to improve
the overall success rate.  Data suggests that there are even
common situations in which ICE will fail.  But in thinking
through how to approach diagnosis under those conditions,
I'd currently be more inclined to start from the addition of
diagnostics to an ICE implementation than to focus on the
use of the diagnostic mechanisms described in the draft.

So while I'm generally sympathetic to the idea that there
are situations in which "less than perfect" techniques can
be useful, in practice a number of common situations
where NAT traversal is used today (such as life-critical
Internet telephony) do not seem to fit into that bucket.

It could be that I didn't quite understand the examples
given in the applicability statement, or that I'm putting
too much emphasis on corner conditions, because that is
what customers tend to complain about.

However, overall the document left me unclear about the
rationale by which the material deprecated in RFC 3489
was being re-introduced.   While it does seem possible
to construct a rationale for this, the document doesn't
provide enough background to get me over that hump.






_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf