RE: [Idr] draft-ietf-grow-ops-reqs-for-bgp-error-handling-05

John G. Scudder wrote (on Fri 31-Aug-2012 at 17:24 +0100):

By the way, since this is now in *IETF* last call, if you want your
comments to be considered you should send them to ietf(_at_)ietf(_dot_)org.
Feel free to cc IDR and GROW if you like.


OK.  Message previously posted to idr(_at_)ietf(_dot_)org follows:
----------------------------------------------------------------------
--------

OK.  Clearly I am late to the party, but here goes...

In section 2 the draft talks of three "sub-classes" of error:

  1) "...UPDATE message is considered invalid...due to an error
     within a path attribute...[which]...is syntactically valid..."

  2) "...where a valid NLRI attribute can be extracted..." 

  3) "...where an attribute is not able to be parsed..."

and of the "key requirement to maximize the number of cases in which
it is possible to extract NLRI from a BGP message".

The draft is not specific on what is "syntactically valid", or what is
a "completely parsed" a set of attributes.  However, section 2.1.2
enumerates some "Semantic BGP Errors", which imply a desire to accept
a wide range of very badly broken sets of attributes.

In that context, the proposal to place any MP_REACH_NLRI and/or
MP_UNREACH_NLRI at the start of the attributes is insufficient.

If it is possible to parse both MP_REACH_NLRI and MP_UNREACH_NLRI at
the start of the attributes, and at least the first attribute which
follows is completely valid, then it seems highly likely that all NLRI
can be extracted, no matter how badly broken the rest of the attribute
set is.

But what does it mean if that is not the case ?  It is not known
whether the sender will send attributes in the required order.  It is
not known what combination of 'Withdrawn Routes', 'Network Layer
Reachability Information', MP_REACH_NLRI and MP_UNREACH_NLRI may have
been sent.  So, does the fact that MP attributes are not visible mean
they are not there, or that they are obscured by some other broken
attribute ?

The requirements appear to say that almost any random sequence of
octets should be an acceptable set of attributes, provided that the
NLRI can be identified.  But relaxing the rules on what is an
individually valid attribute means that any (apparent) absence of MP
attribute(s) is more than a little ambiguous.  Accepting (say) an
ATOMIC_AGGREGATOR attribute with a length of (say) 421 makes me feel
more than a little queasy.  It does not contain any NLRI, but I would
not care to guess what earlier misbegotten attribute may have lead us
to this odd looking attribute, or what nonsense may follow, or what
the sender really meant to say, or whether an MP attribute is lost in
the noise.

Reliable identification of the NLRI might be achieved:

  1) as suggested in the draft, by sending MP_UNREACH_NLRI
     and MP_REACH_NLRI as the first attributes.

     But I would add the ORIGIN to that, so that the receiver
     sees:

       * no attributes at all -- hence no MP NLRI,

       * or, just an exactly correct ORIGIN -- signalling no
         MP NLRI,

       * or, MP_UNREACH_NLRI all on its own,

       * or, MP_REACH_NLRI followed by an exactly correct
         3 octets of ORIGIN attribute (as a sort of
         terminator)

       * or, all three.

     *PLUS*

       * a Capability that promises to do so, so that there
         is no ambiguity.

  2) or, more directly, by:

       * adding an UPDATEv2 message type

       * and Capability,

     to completely separate NLRI from Attributes.

Now, it is possible that no known BGP implementation out there sends
more than one form of NLRI in an UPDATE message.  So, the presence of
any one implies the absence of all others, and the presence of any
attribute other than MP_UNREACH_NLRI implies the presence of
MP_REACH_NLRI if there are no other NLRI.  If that is the assumption,
then it should be stated.

However, if the intent is to keep going despite (inter alia) problems
caused by malfunctioning software, then completely separating NLRI
from attributes seems to me the cleanest approach.

If the intent is for a "smart new" BGP implementation to be able to
avoid bringing down sessions when talking to a current BGP
implementation, then some more detailed analysis is required.  For
example, in 2.1.2 the draft defines this as a semantic error: 

    o Zero or invalid length errors in path attributes excluding those
      containing NLRI, or where the length of all path attributes
      contained within the UPDATE does not correspond to the total
path
      attributes length.  In this case, the NLRI can be correctly
      extracted, and hence acted upon.

but in the face of such broken-ness, how, exactly, can NLRI be
correctly extracted ?  (Without some guarantee of how NLRI are sent --
a guarantee which may be relied upon when the sender's attribute
handling is busted ?)

But, if it is expected that both ends of a BGP session must be "smart
new" implementations to get (full) improved error handling, why not
make a clean and complete separation of NLRI from attributes ?  One
can then stop struggling quite so hard to extract sense from nonsense
attributes.

Chris