top of page
  • Writer's pictureMukesh Chanderia

Multi POD ACI Troubleshooting



Scenerio 1 : Multi-Pod Inter-Pod Network (IPN) broadcast, unknown unicast, and

multicast (BUM) troubleshooting 


Symptoms

Endpoint information not exchanged between pods

Leaf does not learn remote endpoint

COOP does not have remote EP information

BGP does not receive prefix

BGP session flapping/down


Expectation


Remote endpoint information should be populated in COOP


COOP will receive remote EP information from BGP EVPN


BGP L2VPN EVPN will establish and receive prefix if we have reachability.



spine1# show coop internal   info  ip-db key 2457600 172.16.2.2 : <-- local site

IP address : 172.16.2.2


Vrf : 2457600


Flags : 0


EP bd vnid : 16220082


EP mac :  00:50:56:B1:44:03


Publisher Id : 10.1.48.64


Record timestamp : 05 02 2018 02:29:12 339899902


Publish timestamp : 05 02 2018 02:29:12 340145880


Seq No: 0


Remote publish timestamp: 01 01 1970 00:00:00 0


URIB Tunnel Info


Num tunnels : 1


pod35-spine1# show coop internal  info ip-db  | egrep -A 15 -B 1 "172.16.2.2$" <-- Remote Site


IP address : 172.16.2.2


Vrf : 3014656


Flags : 0x4


EP bd vnid : 15925206


EP mac :  00:50:56:B1:44:03


Publisher Id : 10.10.35.102


Record timestamp : 01 01 1970 00:00:00 0


Publish timestamp : 01 01 1970 00:00:00 0


Seq No: 0


Remote publish timestamp: 04 24 2018 05:05:34 611613733


URIB Tunnel Info


Num tunnels : 1


Tunnel address : 10.10.35.102


 Tunnel ref count : 1


Tunnel address : 10.1.48.64


 Tunnel ref count : 1



Step 1 : Check  BGP EVPN session between spines of two PODs


spine# show bgp l2vpn evpn summ vrf overlay-1


172.16.22.3 4 65001 29670 56819 55574 0 0 00:01:17 Active --> It means session is down)

172.16.22.4 4 65001 29671 56819 55574 0 0 00:01:20 Active --> It means session is down)


[+] Check lldp neighbors


spine# show lldp neighbors | egrep "IPN|spine|Device ID"


IPN3 Eth1/15 120 BR Eth1/16

IPN4 Eth1/16 120 BR Eth1/20 


Step 2 : Since BGP session are established as overlay on OSPF neighbourship between IPN and Spine.Hence, check ospf session between Spine and IPN.


spine# show ip ospf nei vrf overlay-1


4.4.4.4 1 EXSTART/ - 00:35:22 192.168.2.7 Eth1/16.16 —> Neighbour 4.4.4.4 is in Exstart


3.3.3.3 1 FULL/ - 26w1d 192.168.2.5 Eth1/15.15

[+] Check interface configure MTU is set to 9150


OSPF EXSTART could be due to MTU mismatch between neighbours


A) You may check fault associated with MTU mismatch and Exstart between ospf and IGP


apic# moquery -c faultInst -f 'fault.Inst.code=="F3592”’. —> MTU mismatch


apic# moquery -c faultInst -f 'fault.Inst.code=="F1385"' —> OSPF Exstart


B) TCPDUMP on SPINE to check MTU


spine# tcpdump -i kpm_inb proto ospf -vv -e


 Check MTU configure on interface


spine# show int ethernet 1/16.16 | grep MTU

MTU 9150 bytes, BW 40000000 Kbit, DLY 1 usec


IPN# show int ethernet 1/20 | grep MTU



Resolution : Change MTU on IPN


IPN4(config)# interface Ethernet1/20.4

IPN4(config-subif)# mtu 9150 


MPOD MTU can be found at L3Out for Multi-POD at Infra Tenant.



Default control plane MTU value can be tuned by modifying the corresponding system settings in each APIC.


System --> System Settings --> Control Plane --> 9000



------------------------------------------------------------------------------------------------------------------------------



Scenerio 2


BiDir/Phantom RP


IPN Multicast Routing Features


Spine nodes act as multicast hosts (IGMP only). They do not run PIM.


If a BD is deployed in a Pod, then one spine from that pod will send an IGMP join on one of its IPN-facing interfaces. 


The IPNs receive these joins and send PIM joins towards the Bidirectional PIM RP.


All dataplane traffic sent to the GIPo goes through the RP.


Issue


BUM traffic not working as expected.

ARP not complete on either host (ARP Flood enabled in the BD)


Expectation


An ARP request from Host ingresses the leaf. The leaf should encapsulate this request in the BD GIPo to forward this traffic to the remote site.


Phantom RP is used in a PIM BiDir environment where RP redundancy is designed using loopback networks with different mask lengths in the primary and secondary routers. 


These loopback interfaces are in the same subnet as the RP address, but with different IP addresses from the RP address.



The subnet of the loopback is advertised in the Interior Gateway Protocol (IGP). To maintain RP reachability, it is only necessary to ensure that a route to the RP exists.


Unicast routing longest match algorithms are used to pick the primary over the secondary router.


The primary router announces a longest match route (say, a /30 route for the RP address) and is preferred over the less specific route announced by the secondary router (a /29 route for the same RP address).


The primary router advertises the /30 route of the RP, while the secondary router advertises the /29 route. The latter is only chosen when the primary router goes offline.


With phantom RP, it's expected to use a nonexistent IP on the switch as the RP within the same subnet of the loopback configured.


This should match on all the IPNs for a given group/group range.


Only one spine in each pod joins each group


To check which Spine joined 


Spine # show ip igmp gipo joins | grep 225.1.67.68 (BD Gipo address)



We see that on all IPNs  All the incoming interfaces point to loopback 1 of the local IPN. This means that each IPN believes that they are the RP.


If they believe they are the RP, they are not going forward the IGMP joins they receive. 


Any device that owns that actual RP address would see a local /32 route for it.


We need all the IPNs to agree who is the RP for a given group or group range and the incoming interface of the "show IP mroute command" should point towards that RP.


Since the mroutes are looking like this lets check the configuration for the RP as well as the loop back that is being used for it.


Step 1 :


IPN# show ip mroute 225.1.67.68 vrf IPN


If it shows in output 225.1.67.68/32 


IPN # show run pim | egrep "rp-address|loop|vrf"

IPN # show run interface loopback1 | egrep "ip address|vrf"



Solution


A) Change loopback or pim rp-address.


We need to change either address to allow LPM to prodivide failover.


With Phantom RP, it is expected to use a non-existent IP on the switch as the RP IPNs should point to RP that has longest prefix match Better for failover/redundancy


Phantom RP – 192.168.100.1

Loopback 1 IP - 192.168.100.1


Change loopback in all IPN to 192.168.100.2


IPN1# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN1# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/31



IPN2# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN2# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/31



IPN3# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN3# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/31



IPN4# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN4# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/31


B) Change Mask Configuration


However, our IPNs still point to their local loopback as the RP and the BUM traffic is not successfully traversing the IPNs.


The highest subnet mask should not exceed a /30


Active RP in the group will be the RP with the longest subnet mask.


Should that RP fail. The traffic will fail over to the node that has the next longest subnet mask.


Correct configuration to use recommended less than /30.

Example

IPN1 = /30

IPN2 = /29

IPN3 = /28

IPN4 = /27


IPN1# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN1# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/29



IPN2# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN2# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/30



IPN3# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN3# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/27


IPN4# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN4# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/28



C) OSPF P2P network Configuration




Solution - Add ip ospf network point-to-point.


Int loopback 1


Ip ospf network point-to-point



D) OSPF Interface Cost


Symptoms


BiDir/Phantom RP configuration is correct

BUM traffic not working

Route to RP points back to ACI Spine sub-interface


Solution - Data translation policy


Configure “DSCP class-cos translation policy for L3 traffic”.


The steps on the workaround are below. In a maintenance window since this configuration might be disruptive to traffic flows steps to enable the QOS level.


Step 1

Navigate to Tenants > infra.


Step 2

In the Navigation pane, expand Policies > Protocol > DSCP class-cos translation policy for L3 traffic.


Step 3

In the Properties panel, click Enabled to enable the DSCP policy.


Step 4

Map each traffic stream to one of the available levels.


Note

Each QoS Level must be mapped to a unique value.


The spine will map the outer COS value to a new DSCP class on egress and map DSCP to COS in ingress



Unsupported vPC design 


Case 1



Solution - Use Subinterfaces

The use of subinterfaces on the links connecting the IPN devices to the spines


Case 2




Solution



6 views0 comments

Recent Posts

See All

Comments


bottom of page