Multi POD ACI Troubleshooting

Mukesh Chanderia
Aug 9, 2024
7 min read

Updated: Apr 3

Scenerio 1 : Multi-Pod Inter-Pod Network (IPN) broadcast, unknown unicast, and

multicast (BUM) troubleshooting

Symptoms

Endpoint information not exchanged between pods

Leaf does not learn remote endpoint

COOP does not have remote EP information

BGP does not receive prefix

BGP session flapping/down

Expectation

Remote endpoint information should be populated in COOP

COOP will receive remote EP information from BGP EVPN

BGP L2VPN EVPN will establish and receive prefix if we have reachability.

leaf# show system internal epm endpoint ip 172.16.2.2 | egrep "VRF vnid|sclass"

BD vnid : 16285645 ::: VRF vnid : 2457600

Flags : 0x80005c04 ::: sclass : 49154 ::: Ref count : 5

spine1# show coop internal info ip-db key 2457600 172.16.2.2 : <-- local site

When you run show coop internal info ip-db, you typically see all IP endpoints known to the spine for each VRF. By adding key 2457600 172.16.2.2, you target a specific VRF VNID (2457600) and IP (172.16.2.2).

IP address : 172.16.2.2

Vrf : 2457600

Flags : 0

EP bd vnid : 16220082

EP mac : 00:50:56:B1:44:03

Publisher Id : 10.1.48.64

Record timestamp : 05 02 2018 02:29:12 339899902

Publish timestamp : 05 02 2018 02:29:12 340145880

Seq No: 0

Remote publish timestamp: 01 01 1970 00:00:00 0

URIB Tunnel Info

Num tunnels : 1

pod35-spine1# show coop internal info ip-db | egrep -A 15 -B 1 "172.16.2.2$" <-- Remote Site

A more general search of the COOP IP database for 172.16.2.2, printing 1 line before the match and 15 lines after the match

IP address : 172.16.2.2

Vrf : 3014656

Flags : 0x4

EP bd vnid : 15925206

EP mac : 00:50:56:B1:44:03

Publisher Id : 10.10.35.102

Record timestamp : 01 01 1970 00:00:00 0

Publish timestamp : 01 01 1970 00:00:00 0

Seq No: 0

Remote publish timestamp: 04 24 2018 05:05:34 611613733

URIB Tunnel Info

Num tunnels : 1

Tunnel address : 10.10.35.102

Tunnel ref count : 1

Tunnel address : 10.1.48.64

Tunnel ref count : 1

Step 1 : Check BGP EVPN session between spines of two PODs

spine# show bgp l2vpn evpn summ vrf overlay-1

172.16.22.3 4 65001 29670 56819 55574 0 0 00:01:17 Active --> It means session is down)

172.16.22.4 4 65001 29671 56819 55574 0 0 00:01:20 Active --> It means session is down)

[+] Check lldp neighbors

spine# show lldp neighbors | egrep "IPN|spine|Device ID"

IPN3 Eth1/15 120 BR Eth1/16

IPN4 Eth1/16 120 BR Eth1/20

Step 2 : Since BGP session are established as overlay on OSPF neighbourship between IPN and Spine.Hence, check ospf session between Spine and IPN.

spine# show ip ospf nei vrf overlay-1

4.4.4.4 1 EXSTART/ - 00:35:22 192.168.2.7 Eth1/16.16 —> Neighbour 4.4.4.4 is in Exstart

3.3.3.3 1 FULL/ - 26w1d 192.168.2.5 Eth1/15.15

[+] Check interface configure MTU is set to 9150

OSPF EXSTART could be due to MTU mismatch between neighbours

A) You may check fault associated with MTU mismatch and Exstart between ospf and IGP

apic# moquery -c faultInst -f 'fault.Inst.code=="F3592”’. —> MTU mismatch

apic# moquery -c faultInst -f 'fault.Inst.code=="F1385"' —> OSPF Exstart

B) TCPDUMP on SPINE to check MTU

spine# tcpdump -i kpm_inb proto ospf -vv -e

Check MTU configure on interface

spine# show int ethernet 1/16.16 | grep MTU

MTU 9150 bytes, BW 40000000 Kbit, DLY 1 usec

IPN# show int ethernet 1/20 | grep MTU

Resolution : Change MTU on IPN

IPN4(config)# interface Ethernet1/20.4

IPN4(config-subif)# mtu 9150

MPOD MTU can be found at L3Out for Multi-POD at Infra Tenant.

Default control plane MTU value can be tuned by modifying the corresponding system settings in each APIC.

System --> System Settings --> Control Plane --> 9000

------------------------------------------------------------------------------------------------------------------------------

Scenerio 2

BiDir/Phantom RP

IPN Multicast Routing Features

Spine nodes act as multicast hosts (IGMP only). They do not run PIM.

If a BD is deployed in a Pod, then one spine from that pod will send an IGMP join on one of its IPN-facing interfaces.

The IPNs receive these joins and send PIM joins towards the Bidirectional PIM RP.

All dataplane traffic sent to the GIPo goes through the RP.

Issue

BUM traffic not working as expected.

ARP not complete on either host (ARP Flood enabled in the BD)

Expectation

An ARP request from Host ingresses the leaf. The leaf should encapsulate this request in the BD GIPo to forward this traffic to the remote site.

Phantom RP is used in a PIM BiDir environment where RP redundancy is designed using loopback networks with different mask lengths in the primary and secondary routers.

These loopback interfaces are in the same subnet as the RP address, but with different IP addresses from the RP address.

The subnet of the loopback is advertised in the Interior Gateway Protocol (IGP). To maintain RP reachability, it is only necessary to ensure that a route to the RP exists.

Unicast routing longest match algorithms are used to pick the primary over the secondary router.

The primary router announces a longest match route (say, a /30 route for the RP address) and is preferred over the less specific route announced by the secondary router (a /29 route for the same RP address).

The primary router advertises the /30 route of the RP, while the secondary router advertises the /29 route. The latter is only chosen when the primary router goes offline.

With phantom RP, it's expected to use a nonexistent IP on the switch as the RP within the same subnet of the loopback configured.

This should match on all the IPNs for a given group/group range.

Only one spine in each pod joins each group

To check which Spine joined

Spine # show ip igmp gipo joins | grep 225.1.67.68 (BD Gipo address)

We see that on all IPNs All the incoming interfaces point to loopback 1 of the local IPN. This means that each IPN believes that they are the RP.

If they believe they are the RP, they are not going forward the IGMP joins they receive.

Any device that owns that actual RP address would see a local /32 route for it.

We need all the IPNs to agree who is the RP for a given group or group range and the incoming interface of the "show IP mroute command" should point towards that RP.

Since the mroutes are looking like this lets check the configuration for the RP as well as the loop back that is being used for it.

Step 1 :

IPN# show ip mroute 225.1.67.68 vrf IPN

If it shows in output 225.1.67.68/32

IPN # show run pim | egrep "rp-address|loop|vrf"

IPN # show run interface loopback1 | egrep "ip address|vrf"

Solution

A) Change loopback or pim rp-address.

We need to change either address to allow LPM to prodivide failover.

With Phantom RP, it is expected to use a non-existent IP on the switch as the RP IPNs should point to RP that has longest prefix match Better for failover/redundancy

Phantom RP – 192.168.100.1

Loopback 1 IP - 192.168.100.1

Change loopback in all IPN to 192.168.100.2

IPN1# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN1# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/31

IPN2# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN2# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/31

IPN3# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN3# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/31

IPN4# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN4# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/31

B) Change Mask Configuration

However, our IPNs still point to their local loopback as the RP and the BUM traffic is not successfully traversing the IPNs.

The highest subnet mask should not exceed a /30

Active RP in the group will be the RP with the longest subnet mask.

Should that RP fail. The traffic will fail over to the node that has the next longest subnet mask.

Correct configuration to use recommended less than /30.

Example

IPN1 = /30

IPN2 = /29

IPN3 = /28

IPN4 = /27

IPN1# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN1# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/29

IPN2# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN2# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/30

IPN3# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN3# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/27

IPN4# show run pim | egrep "rp-address|loop|vrf"

vrf context IPN

ip pim rp-address 192.168.100.1 group-list 224.0.0.0/4 bidir

ip pim rp-address 192.168.100.1 group-list 239.0.0.0/8 bidir

interface loopback1

IPN4# show run interface loopback1 | egrep "ip address|vrf"

vrf member IPN

ip address 192.168.100.2/28

C) OSPF P2P network Configuration

Solution - Add ip ospf network point-to-point.

Int loopback 1

Ip ospf network point-to-point

D) OSPF Interface Cost

Symptoms

BiDir/Phantom RP configuration is correct

BUM traffic not working

Route to RP points back to ACI Spine sub-interface

Solution - Data translation policy

Configure “DSCP class-cos translation policy for L3 traffic”.

The steps on the workaround are below. In a maintenance window since this configuration might be disruptive to traffic flows steps to enable the QOS level.

Step 1

Navigate to Tenants > infra.

Step 2

In the Navigation pane, expand Policies > Protocol > DSCP class-cos translation policy for L3 traffic.

Step 3

In the Properties panel, click Enabled to enable the DSCP policy.

Step 4

Map each traffic stream to one of the available levels.

Note

Each QoS Level must be mapped to a unique value.

The spine will map the outer COS value to a new DSCP class on egress and map DSCP to COS in ingress

Unsupported vPC design

Case 1

Solution - Use Subinterfaces

The use of subinterfaces on the links connecting the IPN devices to the spines

Case 2

Solution

Multi POD ACI Troubleshooting

Recent Posts

Comments