top of page
  • Writer's pictureMukesh Chanderia

ACI Basics

Updated: Jan 7


You can wipe the Cisco APIC using the following commands:


apic# acidiag touch setup

This command will reset the device configuration, Proceed? [y/N] y


apic# acidiag touch clean

This command will wipe out this device.


apic# acidiag reboot

This command will restart this device, Proceed? [y/N] y


You could wipe the switches using the following commands:


switch# setup-clean-config.sh or acidiag touch clean

This command will wipe out this device, Proceed? [y/N] y

switch# reload


APIC initial config


Press Enter at anytime to assume the default values. Use ctrl-d at anytime to restart from the beginning.


Cluster configuration ...

Enter the fabric name [ACI Fabric1]: Fabric

Enter the fabric ID (1-128) [1]: 1

Enter the number of active controllers in the fabric (1-9) [3]: 3

Is this a standby controller? [NO]: NO

Is this an APIC-X? [NO]: NO

Enter the controller ID (1-3) [1]: 2

Standalone APIC Cluster ? yes/no [no] no

Enter the POD ID (1-254) [1]: 1

Enter the controller name [apic1]: apic2

Enter address pool for TEP addresses [10.0.0.0/16]: 10.0.0.0/16

Note: The infra VLAN ID should not be used elsewhere in your environment

and should not overlap with any other reserved VLANs on other platforms.

Enter the VLAN ID for infra network (1-4094): 3967


Out-of-band management configuration ...",

Enable IPv6 for Out of Band Mgmt Interface? [N]: N

Enter the IPv4 address [192.168.10.1/24]: 192.168.11.2/24

Enter the IPv4 address of the default gateway [None]: 192.168.11.254

Enter the interface speed/duplex mode [auto]: auto


Cluster configuration ...

Fabric name: Fabric

Fabric ID: 1

Number of controllers: 3

Controller name: apic2

POD ID: 1

Controller ID: 2

TEP address pool: 10.0.0.0/16

Infra VLAN ID: 3967


Out-of-band management configuration ...

Management IP address: 192.168.11.2/24

Default gateway: 192.168.11.254

Interface speed/duplex mode: auto


admin user configuration ...

The admin user configuration will be syncronized

from the first controller after this controller joins the cluster.


The above configuration will be applied ...


Warning: TEP address pool and Infra VLAN ID cannot be changed later, these are permanent until the fabric is wiped.


Would you like to edit the configuration? (y/n) [n]: n


apic1# acidiag fnvread

ID Pod ID Name Serial Number IP Address Role State LastUpdMsgId

------------------------------------------------------

101 1 leaf1 S/N 10.0.2.64/32 leaf active 0

102 1 leaf2 S/N 10.0.3.65/32 leaf active 0

201 1 spine1 S/N 10.0.32.66/32 spine active 0



On Cisco APIC, verify the LLDP neighbors on the fabric-facing interfaces eth2-1 and eth2-2 using the acidiag run lldptool command.


apic1# acidiag run lldptool in eth2-1

Chassis ID TLV

MAC: 00:3a:9c:7e:58:c2

Port ID TLV

Local: Eth1/2

Time to Live TLV

120

Port Description TLV

topology/pod-1/paths-101/pathep-[eth1/2]

System Name TLV

leaf-a

System Description TLV

topology/pod-1/node-101

System Capabilities TLV

System capabilities: Bridge, Router

Enabled capabilities: Bridge, Router

Management Address TLV

IPv4: 192.168.10.211

Ifindex: 83886080

Cisco 4-wire Power-via-MDI TLV

4-Pair PoE supported

Spare pair Detection/Classification not required

PD Spare pair Desired State: Disabled

PSE Spare pair Operational State: Disabled

Cisco Port Role TLV

4

Cisco Port Mode TLV

0

Cisco Port State TLV

1

Cisco Model TLV

N9K-C93180YC-FX

Cisco Serial Number TLV

FDO23161CZ0

Cisco Firmware Version TLV

n9000-15.2(1g)

Cisco Node Role TLV

1

Cisco Infra VLAN TLV

369

Cisco Name TLV

leaf-a

Cisco Fabric Name TLV

Fabric

Cisco Node IP TLV

IPv4:10.0.32.64

Cisco Node ID TLV

101

Cisco POD ID TLV

1

Cisco Appliance Vector TLV

Id: 1

IPv4: 10.0.0.1

UUID: 9df7d5a0-ca14-33eb-beda-e526c6a0aa53

LLDP-MED Capabilities TLV

Device Type: netcon

Capabilities: LLDP-MED, Network Policy, Extended Power via MDI-PSE

LLDP-MED Network Policy TLV

01400000

End of LLDPDU TLV



From APIC , Cross-check the chassis ID with the Cisco APIC UUID obtained from the leafs .


Leaf : show lldp neighbour detail

Leaf : show lldp traffic


(none)# Prompt means switch hasn’t been discovered yet


(none)# moquery -c faultInfo (contails all fault)


TPM Disabled in BIOS → Enable it


LLDP Enabled in CIMC/VIC → Disable it



“Show cli list” → to view all CLI commands available


APIC Logs

—-------------

/var/log/dme/log

/var/log/dme/oldlog


Switch Logs

—---------------

/var/log/dme/log

/var/log/dme/oldlog

/var/sysmgr/tmp_logs


APIC# show epg BLUE detail


Leaf1# iping -V tenant:vrf01 -S 172.16.1.1[GW BD IP] 172.16.1.22 (Destination)


apic1# acidiag avread

Local appliance ID=1 ADDRESS=10.0.0.1 TEP ADDRESS=10.0.0.0/16 ROUTABLE IP ADDRESS=0.0.0.0 CHASSIS_ID=9df7d5a0-ca14-11eb-beda-e526c7a0aa53

Cluster of 1 lm(t):1(zeroTime) appliances (out of targeted 1 lm(t):1(2021-06-11T09:39:44.787+00:00)) with FABRIC_DOMAIN name=Fabric set to version=5.2(1g) lm(t):1(2021-06-11T09:40:01.215+00:00); discoveryMode=PERMISSIVE lm(t):0(1970-01-01T00:00:00.001+00:00); drrMode=OFF lm(t):0(1970-01-01T00:00:00.001+00:00); kafkaMode=OFF lm(t):0(1970-01-01T00:00:00.001+00:00)

appliance id=1 address=10.0.0.1 lm(t):1(2021-06-10T19:44:55.051+00:00) tep address=10.0.0.0/16 lm(t):1(2021-06-10T19:44:55.051+00:00) routable address=0.0.0.0 lm(t):1(zeroTime) oob address=192.168.11.1/24 lm(t):1(2021-06-10T19:45:00.131+00:00) version=5.2(1g) lm(t):1(2021-06-10T19:45:00.188+00:00) chassisId=9df7d5a0-ca14-11eb-beda-e526c7a0aa53 lm(t):1(2021-06-10T19:45:00.188+00:00) capabilities=0X7EEFFFFFFFFF--0X2020--0X1 lm(t):1(2021-06-11T09:44:04.539+00:00) rK=(stable,present,0X206173722D687373) lm(t):1(2021-06-10T19:45:00.134+00:00) aK=(stable,present,0X206173722D687373) lm(t):1(2021-06-10T19:45:00.134+00:00) oobrK=(stable,present,0X206173722D687373) lm(t):1(2021-06-10T19:45:00.134+00:00) oobaK=(stable,present,0X206173722D687373) lm(t):1(2021-06-10T19:45:00.134+00:00) cntrlSbst=(APPROVED, FCH2128V0F0) lm(t):1(2021-06-10T19:45:00.188+00:00) (targetMbSn= lm(t):0(zeroTime), failoverStatus=0 lm(t):0(zeroTime)) podId=1 lm(t):1(2021-06-10T19:44:55.051+00:00) commissioned=YES lm(t):1(zeroTime) registered=YES lm(t):1(2021-06-10T19:44:55.051+00:00) standby=NO lm(t):1(2021-06-10T19:44:55.051+00:00) DRR=NO lm(t):0(zeroTime) apicX=NO lm(t):1(2021-06-10T19:44:55.051+00:00) virtual=NO lm(t):1(2021-06-10T19:44:55.051+00:00) active=YES(2021-06-10T19:44:55.051+00:00) health=(applnc:255 lm(t):1(2021-06-10T19:47:00.737+00:00) svc's)

---------------------------------------------

clusterTime=<diff=-7610 common=2021-06-11T18:30:33.430+00:00 local=2021-06-11T18:30:41.040+00:00 pF=<displForm=0 offsSt=0 offsVlu=0 lm(t):1(2021-06-11T09:39:41.180+00:00)>>

---------------------------------------------


Interfaces in APIC (ifconfig)


bond0: A logical bond that bundles the physical interfaces attached to the fabric (eth2-1 and eth2-2).


bond1: A logical bond that provides OOB connectivity.


bond0.369: Subinterface of the bond0 interface that carries Infra traffic, such as packets encapsulated with Infra VLAN (369) 802.1Q header. The IP address of this subinterface is 10.0.0.1/32. It belongs to the TEP address pool (10.0.0.0/16) that was configured in the setup utility.


oobmgmt: Logical interface for OOB management configured during the initial setup.



The bonding mode is set to fault-tolerance (active-backup). In the example below, eth2-2, facing leaf-b, is active.


Identify the active link on Cisco APIC


/proc/net/bonding/bond0


leaf2 must have been discovered first.


APIC’s bond0 is active/standby port-channel


apic1# cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.7.1 (April 30, 2023)


Bonding Mode: fault-tolerance (active-backup)

Primary Slave: None

Currently Active Slave: eth2-2

MII Status: up

MII Polling Interval (ms): 60

Up Delay (ms): 0

Down Delay (ms): 0


Slave Interface: eth2-1

MII Status: up

Speed: 10000 Mbps

Duplex: full

Link Failure Count: 1

Permanent HW addr: 38:90:a5:40:76:ea

Slave queue ID: 0


Slave Interface: eth2-2

MII Status: up

Speed: 10000 Mbps

Duplex: full

Link Failure Count: 1

Permanent HW addr: 38:90:a5:40:76:eb

Slave queue ID: 0


Packet Drop


Leaf

SSH to the leaf and run these commands. This example is for ethernet 1/31.

ACI-LEAF# vsh_lc



Spine

A fixed spine (N9K-C9332C and N9K-C9364C) can be checked using the same method as the leaf switches.

For a modular spine (N9K-C9504 etc.), the linecard must be attached to before the platform counters can be viewed. SSH to the spine and run these commands. This example is for ethernet 2/1.

ACI-SPINE# vsh

ACI-SPINE# attach module 2

module-2# show platform internal counters port 1



Queuing stats counters are shown using 'show queuing interface'.


ACI-LEAF# show queuing interface ethernet 1/5



Viewing statistics in GUI

The location is 'Fabric > Inventory > Leaf/Spine > Physical interface > Stats/ Error Counters /QoS Stats




leaf-a# show vrf

VRF-Name VRF-ID State Reason

black-hole 3 Up --

overlay-1 4 Up --

Note

Cisco ACI uses a dedicated VRF as an infrastructure to carry VXLAN traffic. The transport infrastructure for VXLAN traffic is known as overlay-1, which exists as part of the tenant “infra.”


leaf-a# show vrf

VRF-Name VRF-ID State Reason

black-hole 3 Up --

overlay-1 4 Up --


Cisco ACI uses a dedicated VRF as an infrastructure to carry VXLAN traffic. The transport infrastructure for VXLAN traffic is known as overlay-1, which exists as part of the tenant “infra.” Leaf nodes are known as PTEPs (physical tunnel endpoints).



VRF


VRF offers an additional feature called "Policy Control Enforcement" which allows you to disable the security model based on allow lists that is enforced through EPG and contracts.


By default, this security model is active, preventing communication between EPGs unless specified in a contract rule. However, when Policy Control Enforcement is turned off, no contract rules will be applied, and endpoints can freely communicate with each other as long as there is Layer 2 or Layer 3 reachability.




Bridge Domain


Bridge domains possess the following attributes:

  1. They serve as Layer 2 forwarding domains.

  2. They offer a default gateway and subnet configuration for endpoints.

  3. Each bridge domain is associated with a single VRF.

  4. Tenants can have one or more bridge domains.

  5. VRFs can have one or more bridge domains.

  6. Bridge domains can encompass multiple subnets.


EGP


In ACI, multiple Endpoint Groups (EPGs) are defined within a Layer 2 domain (Bridge Domain or BD) to achieve security isolation in addition to Layer 2 network separation.


In traditional network devices, VLAN ID is used as the smallest form of segmentation for Layer 2 network separation. However, in ACI, the Layer 2 domain (BD) is not directly associated with a VLAN ID. Instead, ACI introduces an extra layer of segmentation using a VLAN ID that is smaller than the Layer 2 domain (EPG).


Consequently, in ACI, the EPG serves as a finer security segmentation compared to the Layer 2 domain, and the VLAN ID becomes a parameter for security separation rather than being tied solely to Layer 2 network separation.


An endpoint comprises of a MAC address and can have one or more IP addresses, representing a single networking device.

In traditional networks, three tables are utilized to manage the network addresses of external devices:

  1. A MAC address table for Layer 2 forwarding.

  2. A Routing Information Base (RIB) for Layer 3 forwarding.

  3. An ARP table for the correlation between IP addresses and MAC addresses.

However, Cisco ACI introduces a consolidation of the MAC address table and ARP table into a single table called the endpoint table. This alteration implies that Cisco ACI acquires such information through a different method compared to traditional networks.


In Cisco ACI, MAC and IP addresses are learned in hardware by inspecting the packet source MAC address and source IP address in the data plane, instead of relying on ARP to obtain the MAC address of the next hop for IP addresses.


This approach reduces the resources required to process and generate ARP traffic. It also enables the detection of IP and MAC address movements without waiting for GARP, as long as some traffic is sent from the new host.


Although Cisco ACI employs the endpoint table instead of separate MAC address and ARP tables, it still utilizes the RIB and ARP table for L3Out functionality.


Forwarding table lookup order:

  • Endpoint table (show endpoint)

  • RIB (show ip route)


APIC# show epg BLUE detail


Basic Bridge Domain Configuration


Hardware Proxy or flooding mode for Layer 2 Unknown Unicast packets.


Hardware proxy for Layer 2 unknown unicast traffic is the default option. If the leaf doesn't know the destination mac address then packet is sent to the spine proxy.


Now With Layer 2 unknown unicast flooding (hardware proxy is not selected) the forwarding does not use the COOP database on spine switches. Layer 2 unknown unicast packets are flooded within the bridge domain.


Note: The leaf endpoint table and spine COOP database are still populated with the MAC-to-VTEP information.


Enable or disable Address Resolution Protocol (ARP) flooding.


When ARP flooding is enabled, the bridge domain operates in a manner consistent with traditional networks, where ARP traffic is flooded throughout the domain.


However, if ARP flooding feature is disabled, the ingress leaf employs unicast communication to transmit ARP traffic either to the destination leaf or to the spine-proxy.


It's important to note that these options are applicable only when unicast routing is enabled for the bridge domain. In cases where unicast routing is disabled, ARP traffic will always be flooded within the bridge domain.





The Layer 3 Configurations tab provides options to configure the following essential parameters:


1) Unicast Routing: Enabling this setting, along with configuring a subnet address, allows the fabric to function as the default gateway within the bridge domain and route traffic accordingly. Additionally, when unicast routing is enabled, the endpoint table on the leaf switches learns the mapping of IP addresses to Tunnel Endpoint (TEP) for this specific bridge domain. It's worth noting that IP learning does not depend on having a subnet configured under the bridge domain.


2) Subnet Address: This option allows you to configure the IP addresses of the SVIs (Switched Virtual Interfaces), which act as the default gateways for the bridge domain. The available options for configuring a subnet under a bridge domain are as follows:


a. Private to VRF: This subnet is limited to its respective Virtual Routing and Forwarding (VRF) within the tenant. It does not extend beyond that VRF.


b. Advertised externally: This subnet can be advertised to a routed connection, enabling it to be accessible from external networks.


c. Shared between VRFs: This subnet can be shared with and exported to multiple VRFs within the same tenant or across tenants as part of a shared service.


An example of a shared service is a routed connection to an Endpoint Group (EPG) present in a different VRF within a different tenant. This configuration allows bidirectional traffic flow across VRFs.


It's important to note that for an EPG providing a shared service, the subnet must be configured under that EPG (not under a bridge domain), and its scope must be set to "advertised externally" and "shared between VRFs."


Unicast routing is enabled by default, and is required when you configure a default gateway for a bridge domain inside Cisco ACI fabric. If you configure the default gateway outside the fabric (for example, on a firewall), you should disable unicast routing and enable ARP flooding.


Unicast routing should be disabled to avoid unnecessary IP learning that may cause unexpected IP forwarding.


-------------------------------------------------------------------------------------------------------------------------------


General Troubleshooting


avread --> Displays APICs within the cluster.


fnvread --> Displays the address and state of switch nodes registered with the fabric.


fnvreadex --> Displays additional information for switch nodes registered with the fabric.


rvread service --> Summarizes the data layer state. The output shows a summary of the data layer state for each service. The shard view shows replicas in ascending order.


rvread service shard --> Displays the data layer state for a service on a specific shard across all replicas.


rvread service shard replica --> Displays the data layer state for a service on a specific shard and replica.


crashsuspecttracker --> Tracks states of a service or data subset that indicate a crash.


dbgtoken--> Generates a token to permit remote SSH access.


version --> Displays the APIC ISO software version.


APIC# man acidiag


Service IDs:

1 - cliD

2 - controller

3 - eventmgr

4 - extXMLApi

5 - policyelem

6 - policymgr

7 - reader

8 - ae

9 - topomgr

10 - observer

11 - dbgr

12 - observerelem

13 - dbgrelem

14 - vmmmgr

15 - nxosmock

16 - bootmgr

17 - appliancedirector

18 - adrelay

19 - ospaagent

20 - vleafelem

21 - dhcpd

22 - scripthandler

23 - idmgr

24 - ospaelem

25 - osh

26 - opflexagent

27 - opflexelem

28 - confelem

29 - vtap

30 - snmpd

31 - opflexp

32 - analytics

33 - policydist

34 - plgnhandler

35 - domainmgr

36 - licensemgr

37 - no service

38 - platformmgr

39 - edmgr



Data States

COMATOSE: 0

NEWLY_BORN: 1

UNKNOWN: 2

DATA_LAYER_DIVERGED: 11

DATA_LAYER_DEGRADED_LEADERSHIP: 12

DATA_LAYER_ENTIRELY_DIVERGED: 111

DATA_LAYER_PARTIALLY_DIVERGED: 112

DATA_LAYER_ENTIRELY_DEGRADED_LEADERSHIP: 121

DATA_LAYER_PARTIALLY_DEGRADED_LEADERSHIP: 122

FULLY_FIT: 255


APIC# acidiag rvread 9 15

(9,15,1) st:6 lm(t):3(2024-01-06T12:29:47.065+00:00) le: reSt:LEADER voGr:0 cuTerm:0x50 lCoTe:0x4f lCoIn:0x78000000001d9864 veFiSt:0x13 veFiEn:0x13 lm(t):3(2024-01-06T12:29:47.053+00:00) stMmt:1 lm(t):0(zeroTime) ReTx:0 lm(t):0(zeroTime) lastUpdt 2024-01-07T04:44:20.873+00:00


APIC# acidiag rvread 9 11

(9,11,1) st:6 lm(t):2(2024-01-06T12:29:24.547+00:00) le: reSt:LEADER voGr:0 cuTerm:0x52 lCoTe:0x51 lCoIn:0x58000000001e1304 veFiSt:0x29 veFiEn:0x29 lm(t):2(2024-01-06T12:29:24.507+00:00) stMmt:1 lm(t):0(zeroTime) lp: clSt:2 lm(t):2(2024-01-06T12:04:38.6


Login as root


Since service ID 9 is topomgr


systemctl start topomgr

systemctl stop topmgr

systemctl restart topomgr

systemctl status topomgr


Example: APIC1 is in partial diverge state


APIC# rvread


\- unexpected state;    /-unexpected mutator;


s->  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32lcl


r->123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123lcl


  1


  2


  3


  4


  5


  6


  7


  8


  9


 10


 11             \                           \                         \


 12


 13


 14


 15


Non optimal leader for shards : 11:1,11:16,11:19,11:25,11:28,11:31


Since service 11 is dbgr & leader for shard 11 is APIC3


Action Plan:


Stop the dbgr service and start that on 3 APICs and APIC1 is back in fully-fit state


acidiag stop dbgr

acidiag start dbgr




APIC SSD REPLACEMENT PROCEDURE



CIMCServer# scope sol 

  

Server /sol # set enabled yes 

  

Server /sol *# set baud-rate 115200 

  

Server /sol *# commit 

  

Server /sol *#connect host 




APIC CPU and Memory


apic# ps aux --sort -%mem

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

1000     22836  1.3  4.9 11636484 4790212 ?    Ssl  Jan06  14:06 /etc/alternatives/jre_openjdk/bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Des.networkaddress.cache.ttl=60 -Des.ne

ifc       5775  1.6  2.1 2716716 2121416 ?     Ssl  Jan06  17:49 /mgmt//bin/svc_ifc_reader.bin --x

root      7380  1.8  1.2 1980428 1226688 ?     Ssl  Jan06  19:28 /mgmt//bin/nginx.bin -p /data//nginx/

ifc       5766  2.1  1.0 1695524 1006004 ?     Ssl  Jan06  23:04 /mgmt//bin/svc_ifc_policymgr.bin --x

ifc       5765  1.7  1.0 1642268 995828 ?      Ssl  Jan06  19:02 /mgmt//bin/svc_ifc_observer.bin --x




apic# top -o %MEM

top - 05:39:56 up 17:46,  1 user,  load average: 2.70, 2.54, 2.42

Tasks: 681 total,   1 running, 304 sleeping,   0 stopped,   0 zombie

%Cpu(s):  3.2 us,  2.8 sy,  0.0 ni, 94.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

KiB Mem : 97353248 total, 51438976 free, 19963508 used, 25950764 buff/cache

KiB Swap:        0 total,        0 free,        0 used. 76119576 avail Mem 


  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                                   

22836 1000      20   0   11.1g   4.6g  25616 S   0.0  4.9  14:06.19 java                                                                                                                                                                                      

 5775 ifc       20   0 2716716   2.0g 166900 S   0.0  2.2  17:50.15 svc_ifc_reader.                                                                                                                                                                           

 7380 root      20   0 1980428   1.2g 198468 S   5.9  1.3  19:28.69 nginx.bin                                                                                                                                                                                 

 5766 ifc       20   0 1695524 982.4m 224212 S   0.0  1.0  23:04.94 svc_ifc_policym                                                                                                                                                                           

                   



apic# ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head -n30

  PID  PPID CMD                         %MEM %CPU

22836 22834 /etc/alternatives/jre_openj  4.9  1.3

 5775     1 /mgmt//bin/svc_ifc_reader.b  2.1  1.6

 7380     1 /mgmt//bin/nginx.bin -p /da  1.2  1.8

 5766     1 /mgmt//bin/svc_ifc_policymg  1.0  2.1

 5765     1 /mgmt//bin/svc_ifc_observer  1.0  1.7

 1811 32429 java -Xms1g -Xmx2g -XX:+Hea  0.9  0.9

30639 30450 java -Xmx4096m -Djava.secur  0.8  1.5

 5772     1 /mgmt//bin/svc_ifc_eventmgr  0.7  2.0

32227 32226 /etc/alternatives/jre_1.8.0  0.7 21.5

 1563 31801 java -XX:+UseG1GC -XX:MaxGC  0.6  1.1

 5780     1 /mgmt/opt/controller/decoy/  0.6  0.0








441 views0 comments

Recent Posts

See All

Commentaires


bottom of page