You can wipe the Cisco APIC using the following commands:
apic# acidiag touch setup
This command will reset the device configuration, Proceed? [y/N] y
apic# acidiag touch clean
This command will wipe out this device.
apic# acidiag reboot
This command will restart this device, Proceed? [y/N] y
You could wipe the switches using the following commands:
switch# setup-clean-config.sh or acidiag touch clean
This command will wipe out this device, Proceed? [y/N] y
switch# reload
APIC initial config
Press Enter at anytime to assume the default values. Use ctrl-d at anytime to restart from the beginning.
Cluster configuration ...
Enter the fabric name [ACI Fabric1]: Fabric
Enter the fabric ID (1-128) [1]: 1
Enter the number of active controllers in the fabric (1-9) [3]: 3
Is this a standby controller? [NO]: NO
Is this an APIC-X? [NO]: NO
Enter the controller ID (1-3) [1]: 2
Standalone APIC Cluster ? yes/no [no] no
Enter the POD ID (1-254) [1]: 1
Enter the controller name [apic1]: apic2
Enter address pool for TEP addresses [10.0.0.0/16]: 10.0.0.0/16
Note: The infra VLAN ID should not be used elsewhere in your environment
and should not overlap with any other reserved VLANs on other platforms.
Enter the VLAN ID for infra network (1-4094): 3967
Out-of-band management configuration ...",
Enable IPv6 for Out of Band Mgmt Interface? [N]: N
Enter the IPv4 address [192.168.10.1/24]: 192.168.11.2/24
Enter the IPv4 address of the default gateway [None]: 192.168.11.254
Enter the interface speed/duplex mode [auto]: auto
Cluster configuration ...
Fabric name: Fabric
Fabric ID: 1
Number of controllers: 3
Controller name: apic2
POD ID: 1
Controller ID: 2
TEP address pool: 10.0.0.0/16
Infra VLAN ID: 3967
Out-of-band management configuration ...
Management IP address: 192.168.11.2/24
Default gateway: 192.168.11.254
Interface speed/duplex mode: auto
admin user configuration ...
The admin user configuration will be syncronized
from the first controller after this controller joins the cluster.
The above configuration will be applied ...
Warning: TEP address pool and Infra VLAN ID cannot be changed later, these are permanent until the fabric is wiped.
Would you like to edit the configuration? (y/n) [n]: n
apic1# acidiag fnvread
ID Pod ID Name Serial Number IP Address Role State LastUpdMsgId
------------------------------------------------------
101 1 leaf1 S/N 10.0.2.64/32 leaf active 0
102 1 leaf2 S/N 10.0.3.65/32 leaf active 0
201 1 spine1 S/N 10.0.32.66/32 spine active 0
On Cisco APIC, verify the LLDP neighbors on the fabric-facing interfaces eth2-1 and eth2-2 using the acidiag run lldptool command.
apic1# acidiag run lldptool in eth2-1
Chassis ID TLV
MAC: 00:3a:9c:7e:58:c2
Port ID TLV
Local: Eth1/2
Time to Live TLV
120
Port Description TLV
topology/pod-1/paths-101/pathep-[eth1/2]
System Name TLV
leaf-a
System Description TLV
topology/pod-1/node-101
System Capabilities TLV
System capabilities: Bridge, Router
Enabled capabilities: Bridge, Router
Management Address TLV
IPv4: 192.168.10.211
Ifindex: 83886080
Cisco 4-wire Power-via-MDI TLV
4-Pair PoE supported
Spare pair Detection/Classification not required
PD Spare pair Desired State: Disabled
PSE Spare pair Operational State: Disabled
Cisco Port Role TLV
4
Cisco Port Mode TLV
0
Cisco Port State TLV
1
Cisco Model TLV
N9K-C93180YC-FX
Cisco Serial Number TLV
FDO23161CZ0
Cisco Firmware Version TLV
n9000-15.2(1g)
Cisco Node Role TLV
1
Cisco Infra VLAN TLV
369
Cisco Name TLV
leaf-a
Cisco Fabric Name TLV
Fabric
Cisco Node IP TLV
IPv4:10.0.32.64
Cisco Node ID TLV
101
Cisco POD ID TLV
1
Cisco Appliance Vector TLV
Id: 1
IPv4: 10.0.0.1
UUID: 9df7d5a0-ca14-33eb-beda-e526c6a0aa53
LLDP-MED Capabilities TLV
Device Type: netcon
Capabilities: LLDP-MED, Network Policy, Extended Power via MDI-PSE
LLDP-MED Network Policy TLV
01400000
End of LLDPDU TLV
From APIC , Cross-check the chassis ID with the Cisco APIC UUID obtained from the leafs .
Leaf : show lldp neighbour detail
Leaf : show lldp traffic
(none)# Prompt means switch hasn’t been discovered yet
(none)# moquery -c faultInfo (contails all fault)
TPM Disabled in BIOS → Enable it
LLDP Enabled in CIMC/VIC → Disable it
“Show cli list” → to view all CLI commands available
APIC Logs
—-------------
/var/log/dme/log
/var/log/dme/oldlog
Switch Logs
—---------------
/var/log/dme/log
/var/log/dme/oldlog
/var/sysmgr/tmp_logs
APIC# show epg BLUE detail
Leaf1# iping -V tenant:vrf01 -S 172.16.1.1[GW BD IP] 172.16.1.22 (Destination)
apic1# acidiag avread
Local appliance ID=1 ADDRESS=10.0.0.1 TEP ADDRESS=10.0.0.0/16 ROUTABLE IP ADDRESS=0.0.0.0 CHASSIS_ID=9df7d5a0-ca14-11eb-beda-e526c7a0aa53
Cluster of 1 lm(t):1(zeroTime) appliances (out of targeted 1 lm(t):1(2021-06-11T09:39:44.787+00:00)) with FABRIC_DOMAIN name=Fabric set to version=5.2(1g) lm(t):1(2021-06-11T09:40:01.215+00:00); discoveryMode=PERMISSIVE lm(t):0(1970-01-01T00:00:00.001+00:00); drrMode=OFF lm(t):0(1970-01-01T00:00:00.001+00:00); kafkaMode=OFF lm(t):0(1970-01-01T00:00:00.001+00:00)
appliance id=1 address=10.0.0.1 lm(t):1(2021-06-10T19:44:55.051+00:00) tep address=10.0.0.0/16 lm(t):1(2021-06-10T19:44:55.051+00:00) routable address=0.0.0.0 lm(t):1(zeroTime) oob address=192.168.11.1/24 lm(t):1(2021-06-10T19:45:00.131+00:00) version=5.2(1g) lm(t):1(2021-06-10T19:45:00.188+00:00) chassisId=9df7d5a0-ca14-11eb-beda-e526c7a0aa53 lm(t):1(2021-06-10T19:45:00.188+00:00) capabilities=0X7EEFFFFFFFFF--0X2020--0X1 lm(t):1(2021-06-11T09:44:04.539+00:00) rK=(stable,present,0X206173722D687373) lm(t):1(2021-06-10T19:45:00.134+00:00) aK=(stable,present,0X206173722D687373) lm(t):1(2021-06-10T19:45:00.134+00:00) oobrK=(stable,present,0X206173722D687373) lm(t):1(2021-06-10T19:45:00.134+00:00) oobaK=(stable,present,0X206173722D687373) lm(t):1(2021-06-10T19:45:00.134+00:00) cntrlSbst=(APPROVED, FCH2128V0F0) lm(t):1(2021-06-10T19:45:00.188+00:00) (targetMbSn= lm(t):0(zeroTime), failoverStatus=0 lm(t):0(zeroTime)) podId=1 lm(t):1(2021-06-10T19:44:55.051+00:00) commissioned=YES lm(t):1(zeroTime) registered=YES lm(t):1(2021-06-10T19:44:55.051+00:00) standby=NO lm(t):1(2021-06-10T19:44:55.051+00:00) DRR=NO lm(t):0(zeroTime) apicX=NO lm(t):1(2021-06-10T19:44:55.051+00:00) virtual=NO lm(t):1(2021-06-10T19:44:55.051+00:00) active=YES(2021-06-10T19:44:55.051+00:00) health=(applnc:255 lm(t):1(2021-06-10T19:47:00.737+00:00) svc's)
---------------------------------------------
clusterTime=<diff=-7610 common=2021-06-11T18:30:33.430+00:00 local=2021-06-11T18:30:41.040+00:00 pF=<displForm=0 offsSt=0 offsVlu=0 lm(t):1(2021-06-11T09:39:41.180+00:00)>>
---------------------------------------------
Interfaces in APIC (ifconfig)
bond0: A logical bond that bundles the physical interfaces attached to the fabric (eth2-1 and eth2-2).
bond1: A logical bond that provides OOB connectivity.
bond0.369: Subinterface of the bond0 interface that carries Infra traffic, such as packets encapsulated with Infra VLAN (369) 802.1Q header. The IP address of this subinterface is 10.0.0.1/32. It belongs to the TEP address pool (10.0.0.0/16) that was configured in the setup utility.
oobmgmt: Logical interface for OOB management configured during the initial setup.
The bonding mode is set to fault-tolerance (active-backup). In the example below, eth2-2, facing leaf-b, is active.
Identify the active link on Cisco APIC
/proc/net/bonding/bond0
leaf2 must have been discovered first.
APIC’s bond0 is active/standby port-channel
apic1# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 30, 2023)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth2-2
MII Status: up
MII Polling Interval (ms): 60
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth2-1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 38:90:a5:40:76:ea
Slave queue ID: 0
Slave Interface: eth2-2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 38:90:a5:40:76:eb
Slave queue ID: 0
Packet Drop
Leaf
SSH to the leaf and run these commands. This example is for ethernet 1/31.
ACI-LEAF# vsh_lc
Spine
A fixed spine (N9K-C9332C and N9K-C9364C) can be checked using the same method as the leaf switches.
For a modular spine (N9K-C9504 etc.), the linecard must be attached to before the platform counters can be viewed. SSH to the spine and run these commands. This example is for ethernet 2/1.
ACI-SPINE# vsh
ACI-SPINE# attach module 2
module-2# show platform internal counters port 1
Queuing stats counters are shown using 'show queuing interface'.
ACI-LEAF# show queuing interface ethernet 1/5
Viewing statistics in GUI
The location is 'Fabric > Inventory > Leaf/Spine > Physical interface > Stats/ Error Counters /QoS Stats
leaf-a# show vrf
VRF-Name VRF-ID State Reason
black-hole 3 Up --
overlay-1 4 Up --
Note
Cisco ACI uses a dedicated VRF as an infrastructure to carry VXLAN traffic. The transport infrastructure for VXLAN traffic is known as overlay-1, which exists as part of the tenant “infra.”
leaf-a# show vrf
VRF-Name VRF-ID State Reason
black-hole 3 Up --
overlay-1 4 Up --
Cisco ACI uses a dedicated VRF as an infrastructure to carry VXLAN traffic. The transport infrastructure for VXLAN traffic is known as overlay-1, which exists as part of the tenant “infra.” Leaf nodes are known as PTEPs (physical tunnel endpoints).
VRF
VRF offers an additional feature called "Policy Control Enforcement" which allows you to disable the security model based on allow lists that is enforced through EPG and contracts.
By default, this security model is active, preventing communication between EPGs unless specified in a contract rule. However, when Policy Control Enforcement is turned off, no contract rules will be applied, and endpoints can freely communicate with each other as long as there is Layer 2 or Layer 3 reachability.
Bridge Domain
Bridge domains possess the following attributes:
They serve as Layer 2 forwarding domains.
They offer a default gateway and subnet configuration for endpoints.
Each bridge domain is associated with a single VRF.
Tenants can have one or more bridge domains.
VRFs can have one or more bridge domains.
Bridge domains can encompass multiple subnets.
EGP
In ACI, multiple Endpoint Groups (EPGs) are defined within a Layer 2 domain (Bridge Domain or BD) to achieve security isolation in addition to Layer 2 network separation.
In traditional network devices, VLAN ID is used as the smallest form of segmentation for Layer 2 network separation. However, in ACI, the Layer 2 domain (BD) is not directly associated with a VLAN ID. Instead, ACI introduces an extra layer of segmentation using a VLAN ID that is smaller than the Layer 2 domain (EPG).
Consequently, in ACI, the EPG serves as a finer security segmentation compared to the Layer 2 domain, and the VLAN ID becomes a parameter for security separation rather than being tied solely to Layer 2 network separation.
An endpoint comprises of a MAC address and can have one or more IP addresses, representing a single networking device.
In traditional networks, three tables are utilized to manage the network addresses of external devices:
A MAC address table for Layer 2 forwarding.
A Routing Information Base (RIB) for Layer 3 forwarding.
An ARP table for the correlation between IP addresses and MAC addresses.
However, Cisco ACI introduces a consolidation of the MAC address table and ARP table into a single table called the endpoint table. This alteration implies that Cisco ACI acquires such information through a different method compared to traditional networks.
In Cisco ACI, MAC and IP addresses are learned in hardware by inspecting the packet source MAC address and source IP address in the data plane, instead of relying on ARP to obtain the MAC address of the next hop for IP addresses.
This approach reduces the resources required to process and generate ARP traffic. It also enables the detection of IP and MAC address movements without waiting for GARP, as long as some traffic is sent from the new host.
Although Cisco ACI employs the endpoint table instead of separate MAC address and ARP tables, it still utilizes the RIB and ARP table for L3Out functionality.
Forwarding table lookup order:
Endpoint table (show endpoint)
RIB (show ip route)
APIC# show epg BLUE detail
Basic Bridge Domain Configuration
Hardware Proxy or flooding mode for Layer 2 Unknown Unicast packets.
Hardware proxy for Layer 2 unknown unicast traffic is the default option. If the leaf doesn't know the destination mac address then packet is sent to the spine proxy.
Now With Layer 2 unknown unicast flooding (hardware proxy is not selected) the forwarding does not use the COOP database on spine switches. Layer 2 unknown unicast packets are flooded within the bridge domain.
Note: The leaf endpoint table and spine COOP database are still populated with the MAC-to-VTEP information.
Enable or disable Address Resolution Protocol (ARP) flooding.
When ARP flooding is enabled, the bridge domain operates in a manner consistent with traditional networks, where ARP traffic is flooded throughout the domain.
However, if ARP flooding feature is disabled, the ingress leaf employs unicast communication to transmit ARP traffic either to the destination leaf or to the spine-proxy.
It's important to note that these options are applicable only when unicast routing is enabled for the bridge domain. In cases where unicast routing is disabled, ARP traffic will always be flooded within the bridge domain.
The Layer 3 Configurations tab provides options to configure the following essential parameters:
1) Unicast Routing: Enabling this setting, along with configuring a subnet address, allows the fabric to function as the default gateway within the bridge domain and route traffic accordingly. Additionally, when unicast routing is enabled, the endpoint table on the leaf switches learns the mapping of IP addresses to Tunnel Endpoint (TEP) for this specific bridge domain. It's worth noting that IP learning does not depend on having a subnet configured under the bridge domain.
2) Subnet Address: This option allows you to configure the IP addresses of the SVIs (Switched Virtual Interfaces), which act as the default gateways for the bridge domain. The available options for configuring a subnet under a bridge domain are as follows:
a. Private to VRF: This subnet is limited to its respective Virtual Routing and Forwarding (VRF) within the tenant. It does not extend beyond that VRF.
b. Advertised externally: This subnet can be advertised to a routed connection, enabling it to be accessible from external networks.
c. Shared between VRFs: This subnet can be shared with and exported to multiple VRFs within the same tenant or across tenants as part of a shared service.
An example of a shared service is a routed connection to an Endpoint Group (EPG) present in a different VRF within a different tenant. This configuration allows bidirectional traffic flow across VRFs.
It's important to note that for an EPG providing a shared service, the subnet must be configured under that EPG (not under a bridge domain), and its scope must be set to "advertised externally" and "shared between VRFs."
Unicast routing is enabled by default, and is required when you configure a default gateway for a bridge domain inside Cisco ACI fabric. If you configure the default gateway outside the fabric (for example, on a firewall), you should disable unicast routing and enable ARP flooding.
Unicast routing should be disabled to avoid unnecessary IP learning that may cause unexpected IP forwarding.
-------------------------------------------------------------------------------------------------------------------------------
General Troubleshooting
avread --> Displays APICs within the cluster.
fnvread --> Displays the address and state of switch nodes registered with the fabric.
fnvreadex --> Displays additional information for switch nodes registered with the fabric.
rvread service --> Summarizes the data layer state. The output shows a summary of the data layer state for each service. The shard view shows replicas in ascending order.
rvread service shard --> Displays the data layer state for a service on a specific shard across all replicas.
rvread service shard replica --> Displays the data layer state for a service on a specific shard and replica.
crashsuspecttracker --> Tracks states of a service or data subset that indicate a crash.
dbgtoken--> Generates a token to permit remote SSH access.
version --> Displays the APIC ISO software version.
APIC# man acidiag
Service IDs:
1 - cliD
2 - controller
3 - eventmgr
4 - extXMLApi
5 - policyelem
6 - policymgr
7 - reader
8 - ae
9 - topomgr
10 - observer
11 - dbgr
12 - observerelem
13 - dbgrelem
14 - vmmmgr
15 - nxosmock
16 - bootmgr
17 - appliancedirector
18 - adrelay
19 - ospaagent
20 - vleafelem
21 - dhcpd
22 - scripthandler
23 - idmgr
24 - ospaelem
25 - osh
26 - opflexagent
27 - opflexelem
28 - confelem
29 - vtap
30 - snmpd
31 - opflexp
32 - analytics
33 - policydist
34 - plgnhandler
35 - domainmgr
36 - licensemgr
37 - no service
38 - platformmgr
39 - edmgr
Data States
COMATOSE: 0
NEWLY_BORN: 1
UNKNOWN: 2
DATA_LAYER_DIVERGED: 11
DATA_LAYER_DEGRADED_LEADERSHIP: 12
DATA_LAYER_ENTIRELY_DIVERGED: 111
DATA_LAYER_PARTIALLY_DIVERGED: 112
DATA_LAYER_ENTIRELY_DEGRADED_LEADERSHIP: 121
DATA_LAYER_PARTIALLY_DEGRADED_LEADERSHIP: 122
FULLY_FIT: 255
APIC# acidiag rvread 9 15
(9,15,1) st:6 lm(t):3(2024-01-06T12:29:47.065+00:00) le: reSt:LEADER voGr:0 cuTerm:0x50 lCoTe:0x4f lCoIn:0x78000000001d9864 veFiSt:0x13 veFiEn:0x13 lm(t):3(2024-01-06T12:29:47.053+00:00) stMmt:1 lm(t):0(zeroTime) ReTx:0 lm(t):0(zeroTime) lastUpdt 2024-01-07T04:44:20.873+00:00
APIC# acidiag rvread 9 11
(9,11,1) st:6 lm(t):2(2024-01-06T12:29:24.547+00:00) le: reSt:LEADER voGr:0 cuTerm:0x52 lCoTe:0x51 lCoIn:0x58000000001e1304 veFiSt:0x29 veFiEn:0x29 lm(t):2(2024-01-06T12:29:24.507+00:00) stMmt:1 lm(t):0(zeroTime) lp: clSt:2 lm(t):2(2024-01-06T12:04:38.6
Login as root
Since service ID 9 is topomgr
systemctl start topomgr
systemctl stop topmgr
systemctl restart topomgr
systemctl status topomgr
Example: APIC1 is in partial diverge state
APIC# rvread
\- unexpected state; /-unexpected mutator;
s-> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32lcl
r->123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123lcl
1
2
3
4
5
6
7
8
9
10
11 \ \ \
12
13
14
15
Non optimal leader for shards : 11:1,11:16,11:19,11:25,11:28,11:31
Since service 11 is dbgr & leader for shard 11 is APIC3
Action Plan:
Stop the dbgr service and start that on 3 APICs and APIC1 is back in fully-fit state
acidiag stop dbgr
acidiag start dbgr
APIC SSD REPLACEMENT PROCEDURE
CIMCServer# scope sol
Server /sol # set enabled yes
Server /sol *# set baud-rate 115200
Server /sol *# commit
Server /sol *#connect host
APIC CPU and Memory
apic# ps aux --sort -%mem
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1000 22836 1.3 4.9 11636484 4790212 ? Ssl Jan06 14:06 /etc/alternatives/jre_openjdk/bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Des.networkaddress.cache.ttl=60 -Des.ne
ifc 5775 1.6 2.1 2716716 2121416 ? Ssl Jan06 17:49 /mgmt//bin/svc_ifc_reader.bin --x
root 7380 1.8 1.2 1980428 1226688 ? Ssl Jan06 19:28 /mgmt//bin/nginx.bin -p /data//nginx/
ifc 5766 2.1 1.0 1695524 1006004 ? Ssl Jan06 23:04 /mgmt//bin/svc_ifc_policymgr.bin --x
ifc 5765 1.7 1.0 1642268 995828 ? Ssl Jan06 19:02 /mgmt//bin/svc_ifc_observer.bin --x
apic# top -o %MEM
top - 05:39:56 up 17:46, 1 user, load average: 2.70, 2.54, 2.42
Tasks: 681 total, 1 running, 304 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.2 us, 2.8 sy, 0.0 ni, 94.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 97353248 total, 51438976 free, 19963508 used, 25950764 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 76119576 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22836 1000 20 0 11.1g 4.6g 25616 S 0.0 4.9 14:06.19 java
5775 ifc 20 0 2716716 2.0g 166900 S 0.0 2.2 17:50.15 svc_ifc_reader.
7380 root 20 0 1980428 1.2g 198468 S 5.9 1.3 19:28.69 nginx.bin
5766 ifc 20 0 1695524 982.4m 224212 S 0.0 1.0 23:04.94 svc_ifc_policym
apic# ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head -n30
PID PPID CMD %MEM %CPU
22836 22834 /etc/alternatives/jre_openj 4.9 1.3
5775 1 /mgmt//bin/svc_ifc_reader.b 2.1 1.6
7380 1 /mgmt//bin/nginx.bin -p /da 1.2 1.8
5766 1 /mgmt//bin/svc_ifc_policymg 1.0 2.1
5765 1 /mgmt//bin/svc_ifc_observer 1.0 1.7
1811 32429 java -Xms1g -Xmx2g -XX:+Hea 0.9 0.9
30639 30450 java -Xmx4096m -Djava.secur 0.8 1.5
5772 1 /mgmt//bin/svc_ifc_eventmgr 0.7 2.0
32227 32226 /etc/alternatives/jre_1.8.0 0.7 21.5
1563 31801 java -XX:+UseG1GC -XX:MaxGC 0.6 1.1
5780 1 /mgmt/opt/controller/decoy/ 0.6 0.0
Commentaires