Friday, July 11, 2014

BGP Connection Collision Detection

I ran into an interesting error that I thought I would write up for anyone interested. The issue arose out of a simple case of not paying attention to what I was typing into the CLI which consequently created the problem described here. After entering in the wrong configuration which I had not realized that I did, I started to get the following error messages on two routers I was attempting to establish an iBGP peering with.

Error message

R9#
*Jul  8 05:58:25.107: %BGP-3-NOTIFICATION: received from neighbor 192.168.81.8 active 6/0 (CEASE: unknown subcode) 0 bytes
*Jul  8 05:58:25.111: %BGP-5-NBR_RESET: Neighbor 192.168.81.8 active reset (BGP Notification received)
*Jul  8 05:58:25.119: %BGP-5-ADJCHANGE: neighbor 192.168.81.8 active Down BGP Notification received
*Jul  8 05:58:25.123: %BGP_SESSION-5-ADJCHANGE: neighbor 192.168.81.8 IPv4 Unicast topology base removed from session  BGP Notification received

R8#
*Jul  8 05:59:27.635: %BGP-5-NBR_RESET: Neighbor 192.168.89.9 passive reset (BGP Notification sent)
*Jul  8 05:59:27.647: %BGP-5-ADJCHANGE: neighbor 192.168.89.9 passive Down Error during connection collision
*Jul  8 05:59:32.691: %BGP-3-NOTIFICATION: sent to neighbor 192.168.89.9 passive 6/0 (CEASE: unknown subcode) 0 bytes
*Jul  8 05:59:36.851: %BGP-5-NBR_RESET: Neighbor 192.168.89.9 passive reset (BGP Notification sent)
*Jul  8 05:59:36.863: %BGP-5-ADJCHANGE: neighbor 192.168.89.9 passive Down Error during connection collision


Topology




So I set out to figure out what was causing this annoying error. Since the errors were related to neighborship I began by looking into the neighbors states on both R8 and R9 routers.

R8#sh ip bgp su
*Jul  8 06:12:07.503: %BGP-3-NOTIFICATION: sent to neighbor 192.168.89.9 passive 6/0 (CEASE: unknown subcode) 0 bytes
R8#sh ip bgp sum
BGP router identifier 8.8.8.8, local AS number 300
BGP table version is 1, main routing table version 1

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.28.2    4          300      25      26        1    0    0 00:21:09        0
192.168.58.5    4          300      25      24        1    0    0 00:21:01        0
192.168.81.10   4          300       0       0        1    0    0 never    Idle
192.168.89.9    4          300      24     114        1    0    0 00:18:20        0

R9(config-router)#do sh ip bgp sum
BGP router identifier 9.9.9.9, local AS number 300
BGP table version is 1, main routing table version 1

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.81.8    4          300       0       0        1    0    0 never    Idle
192.168.89.8    4          300       0       0        1    0    0 never    0

So the first thing I noticed here is that I I'm trying to establish a peering with R8 using the wrong subnet. That must be the problem but I wanted to understand the error message a little more so I dug a little more into this issue.

R9#sh ip bgp neighbors 192.168.81.8
BGP neighbor is 192.168.81.8,  remote AS 300, internal link
  BGP version 4, remote router ID 0.0.0.0
  BGP state = Idle
  Neighbor sessions:
    0 active, is not multisession capable (disabled)
    Stateful switchover support enabled: NO
  Default minimum time between advertisement runs is 0 seconds

 For address family: IPv4 Unicast
  BGP table version 1, neighbor version 1/0
  Output queue size : 0
  Index 0, Advertise bit 0
  Slow-peer detection is disabled
  Slow-peer split-update-group dynamic is disabled
                                 Sent       Rcvd
  Prefix activity:               ----       ----
    Prefixes Current:               0          0
    Prefixes Total:                 0          0
    Implicit Withdraw:              0          0
    Explicit Withdraw:              0          0
    Used as bestpath:             n/a          0
    Used as multipath:            n/a          0

                                   Outbound    Inbound
  Local Policy Denied Prefixes:    --------    -------
    Total:                                0          0
  Number of NLRIs in the update sent: max 0, min 0
  Last detected as dynamic slow peer: never
  Dynamic slow peer recovered: never
  Refresh Epoch: 1
  Last Sent Refresh Start-of-rib: never
  Last Sent Refresh End-of-rib: never
  Last Received Refresh Start-of-rib: never
  Last Received Refresh End-of-rib: never
                                       Sent       Rcvd
        Refresh activity:              ----       ----
          Refresh Start-of-RIB          0          0
          Refresh End-of-RIB            0          0

  Address tracking is enabled, the RIB does have a route to 192.168.81.8
  Connections established 0; dropped 0
  Last reset never
  Transport(tcp) path-mtu-discovery is enabled
  Graceful-Restart is disabled
  No active TCP connection

So R8 is attempting to peer with R9 on the wrong subnet, R9 sees that this is incorrect and notifies its neighbor with the "CEASE" error notification. This issue is a function of the connection collision detection feature of BGP which is define in RFC 4217 below:

6.8. BGP Connection Collision Detection 

If a pair of BGP speakers try to establish a BGP connection with each 
other simultaneously, then two parallel connections will be formed. 
If the source IP address used by one of these connections is the same 
as the destination IP address used by the other, and the destination 
IP address used by the first connection is the same as the source IP 
address used by the other, connection collision has occurred. In the 
event of connection collision, one of the connections MUST be closed. 

[...] 

Closing the BGP connection (that results from the collision 
resolution procedure) is accomplished by sending the NOTIFICATION 
message with the Error Code Cease. 


So basically what is going on here is that R9 is configure to establish a neighborship with R8 using the subnet on its  serial interface which connects to R10. Because R9 sees a route to 192.168.81.8 in its routing table it attempts to establish a TCP session with R8 but with the source IP of 192.168.81.9. When the packet arrives at R8 it completes the TCP handshake but then checks its BGP neighbor configuration and finds it does not have a match for R9's source IP address. R8 is configured to peer with 192.168.81.10 not 192.168.81.9 so it sends a BGP notification message with the CEASE error code and begins tearing down the TCP session with R8 as seen in the debug below.

R9#debug ip tcp packet
TCP Packet debugging is on
*Jul  8 06:05:31.079: tcp0: I SYNSENT 192.168.81.8:179 192.168.89.9:46459 seq 3588362141   OPTS 4 ACK 3706581011 SYN  WIN 16384
*Jul  8 06:05:31.087: tcp0: O ESTAB 192.168.81.8:179 192.168.89.9:46459 seq 3706581011  ACK 3588362142  WIN 16384
*Jul  8 06:05:31.115: tcp0: O ESTAB 192.168.81.8:179 192.168.89.9:46459 seq 3706581011  DATA 57 ACK 3588362142 PSH  WIN 16384
*Jul  8 06:05:31.191: tcp0: I ESTAB 192.168.81.8:179 192.168.89.9:46459 seq 3588362142 DATA 21 ACK 3706581068 PSH  WIN 16327
*Jul  8 06:05:31.203: %BGP-3-NOTIFICATION: received from neighbor 192.168.81.8 active 6/0 (CEASE: unknown subcode) 0 bytes
*Jul  8 06:05:31.203: %BGP-5-NBR_RESET: Neighbor 192.168.81.8 active reset (BGP Notification received)
*Jul  8 06:05:31.207: tcp0: O FINWAIT1 192.168.81.8:179 192.168.89.9:46459 seq 3706581068   ACK 3588362163 FIN PSH  WIN 16363
*Jul  8 06:05:31.207: %BGP-5-ADJCHANGE: neighbor 192.168.81.8 active Down BGP Notification received
*Jul  8 06:05:31.211: %BGP_SESSION-5-ADJCHANGE: neighbor 192.168.81.8 IPv4 Unicast topology base removed from session  BGP Notification received
*Jul  8 06:05:31.235: tcp0: I FINWAIT1 192.168.81.8:179 192.168.89.9:46459 seq 3588362163  ACK 3706581069  WIN 16327
*Jul  8 06:05:35.319: tcp0: I FINWAIT2 192.168.81.8:179 192.168.89.9:46459 seq 3588362163  ACK 3706581069 FIN PSH  WIN 16327
*Jul  8 06:05:35.327: tcp0: O TIMEWAIT 192.168.81.8:179 192.168.89.9:46459 seq 3706581069  ACK 3588362164  WIN 16363
*Jul  8 06:05:44.335: tcp0: O CLOSED 192.168.81.8:179 192.168.89.9:38793 seq 1355371637

The solution to this problem was relatively simple. I just had to remove the incorrect configuration:

R9(config-router)#
R9(config-router)#router bgp 300
R9(config-router)#bgp log-neighbor-changes
R9(config-router)#neighbor 192.168.81.8 remote-as 300 <== should have been configured on R10
R9(config-router)#neighbor 192.168.89.8 remote-as 300 

Solution:

R9(config-router)#no neighbor 192.168.81.8 remote-as 300

After this command was removed from R9 the error went away and everything was beautiful again.This might seem like a trivial issue to some, but sometime little mistake can be elusive if you are not paying attention.


Created with Microsoft OneNote 2013.

4 comments:

  1. This one pointed me into the right direction during a troubleshooting today, thank you for posting! Cheers, Boris, CCIE #6373

    ReplyDelete
  2. right now dealing with same issue.
    ul 6 21:05:31.364: %BGP-5-NBR_RESET: Neighbor 172.20.12.38 passive reset (BGP Notification sent)
    Jul 6 21:05:31.364: %BGP-5-ADJCHANGE: neighbor 172.20.12.38 passive Down Error during connection collision
    Jul 6 21:59:34.967: %BGP-3-NOTIFICATION: sent to neighbor 172.20.12.38 passive 6/7 (Connection Collision Resolution) 0 bytes

    have a troubleshooting session late, will publish my findings

    ReplyDelete
  3. Thank you for sharing this. I had the same problem and was able to solve it because of this.

    ReplyDelete