top of page

Understanding “Output Errors” and “Stomped CRC” on Cisco ACI Leaf–Spine Links

  • Writer: Mukesh Chanderia
    Mukesh Chanderia
  • Jan 4
  • 4 min read

Introduction

One of the most misunderstood interface statistics in Cisco ACI and Nexus-based fabrics is the presence of “output errors” on leaf–spine links, often accompanied by “stomped CRC” or input errors on the peer interface.

At first glance, this can appear alarming—especially when:

  • Host-facing ports show zero CRC or input errors

  • Fabric links are stable for years

  • Traffic volumes are extremely high

  • Errors occur only occasionally, not continuously


Common Scenario Observed in the Field


Typical observations engineers encounter:

  • Leaf–spine interfaces show:

    • output error counters incrementing slowly over time

  • The corresponding peer interface shows:

    • stomped CRC and input error

  • Host-facing ports:

    • Clean counters (0 CRC, 0 input errors)

  • No link flaps, no reliability degradation

  • Counters have never been cleared and span several years

This often triggers questions such as:

  • “If hosts are clean, where did these errors come from?”

  • “Why does the switch corrupt frames instead of simply dropping them?”

  • “Is this a hardware or cabling issue?”

Let’s answer these step by step.


Internal Packet Flow in a Cisco ACI Leaf

Before interpreting counters, it’s critical to understand where drops can occur.




Simplified data path inside a leaf:

Host Port
   ↓
Ingress ASIC (frame accepted cleanly)
   ↓
Internal fabric / queues / buffers
   ↓
Egress ASIC (leaf–spine port)
   ↓
Spine

Key takeaway:

Errors seen on leaf–spine ports may originate after the packet has already been accepted from the host.


Why Host-Facing Ports Stay Clean ?

Host-facing interfaces showing:

  • 0 CRC

  • 0 input errors

means:

  • Frames arrived correctly from servers

  • No physical or L2 integrity issues on host links

  • No drops occurred at ingress

If a packet is later dropped inside the leaf’s egress pipeline, it will:

  • Never return to the host

  • Never increment host-facing counters

  • Be invisible to server NIC statistics

This is expected behavior.


What “Stomped CRC” Actually Means ?

Normal CRC Error

  • Frame arrives corrupted on the wire

  • Usually caused by cable, optics, or physical issues

Stomped CRC (Very Different)

  • The ASIC intentionally corrupts the frame’s CRC

  • Done before transmission

  • The frame is sent with a bad FCS so the next hop discards it

This is not random corruption.

It is a controlled hardware mechanism used in specific scenarios.


Why Would an ASIC Do This?

In Cisco Nexus / ACI platforms, the ASIC may use CRC stomping instead of a silent drop in certain conditions:

1. Egress Congestion or Microbursts

  • Multiple ingress flows converge on a single 40G uplink

  • Short bursts exceed egress queue capacity

  • Some packets must be dropped

Instead of silently discarding:

  • ASIC marks the frame invalid

  • Sends it out with a stomped CRC

  • Peer detects and discards it

2. Specific Egress Pipeline Conditions

  • Certain internal error-handling paths

  • Some QoS or buffer-management scenarios

  • Platform-dependent behaviors

3. Accounting and Visibility

By stomping CRC:

  • Transmitting side increments output error

  • Receiving side increments input error / stomped CRC

  • Both sides “agree” a packet was lost

This provides end-to-end accounting, even though the drop decision was made upstream.

Why You Don’t Always See “Output Discards” ?

A common misconception:


“If the switch drops traffic, it must increment discard counters.”

In reality:

  • Discard counters reflect internal silent drops

  • Output errors can reflect frames invalidated on egress

  • Stomped CRC reflects intentional downstream discard

These are different drop accounting mechanisms.

From an end-to-end perspective, the packet is still dropped—just recorded differently.


Correlation Pattern to Look For


Leaf TX output errors ≈ Spine RX stomped CRC

When you see:

  • Matching or near-matching values on both sides

  • No CRC/runts/giants

  • Stable links

This strongly indicates:

  • Intentional ASIC discard behavior

  • Not a physical-layer fault


Error Rate Matters More Than Absolute Numbers

Always normalize errors against traffic volume.

Typical real-world example:

  • Tens of trillions of packets transmitted

  • Tens of thousands of output errors

  • Spread over multiple years

This translates to:

  • Error ratios on the order of 10⁻⁹

  • Well within acceptable operational tolerance

  • Invisible to applications due to TCP retransmissions


Why This Happens “Sometimes, Not Every Day” ?

This intermittent nature is a crucial clue:

  • Microbursts are traffic-pattern dependent

  • Rare congestion events occur during peaks

  • Control-plane events, maintenance windows, or transient bursts can contribute

  • No continuous increase = no persistent fault

If this were a physical issue:

  • CRC errors would be continuous

  • Error rate would scale with traffic

  • Reliability would degrade

  • Links would flap or reset

None of that is observed.


How To Validate ?


Recommended steps:

1. Monitor Error Delta

  • Capture counters

  • Recheck after several hours/days

  • Confirm errors are not rapidly increasing

2. Check Queue / Congestion Indicators

  • Look for egress queue drops

  • Validate oversubscription ratios

3. Verify Optics Health

  • DOM values (Tx/Rx power, temperature)

  • Ensure within vendor specs

4. Software Context

  • Check platform and release notes

  • Identify known cosmetic or accounting behaviors


Conclusion

When all of the following are true:

  • No CRC/runts/giants

  • Stable links over long periods

  • Extremely low error rate

  • Symmetric output error ↔ stomped CRC pattern

  • Clean host-facing interfaces


Then:


The observed output errors result from rare, intentional packet discards performed by the ASIC, most commonly during brief congestion events. They do not indicate a hardware defect, cabling problem, or misconfiguration. This behavior is normal, well-documented, and safe in high-throughput Cisco ACI fabrics


Recent Posts

See All
PBR Concepts

What is a Health Group? A Health Group  is a configuration object used to group specific PBR destination interfaces—typically the consumer and provider interfaces of the same service node (such as a f

 
 
 
Active/Standby F5 Across Different ACI Pods

Normal L3Out vs Floating L3Out Explained Understanding Cisco ACI Multi-Pod Architecture In a Cisco ACI Multi-Pod design: Each Pod has an independent IS-IS control plane Endpoint learning is maintained

 
 
 
Multi-site Traffic Flow

This article explains how traffic flows between Endpoint Groups (EPGs) across multiple sites in Cisco ACI using Nexus Dashboard Orchestrator (NDO). We will walk through three common design scenarios a

 
 
 

Comments


Follow me

© 2021 by Mukesh Chanderia
 

Call

T: 8505812333  

  • Twitter
  • LinkedIn
  • Facebook Clean
©Mukesh Chanderia
bottom of page