PBR Concepts

Mukesh Chanderia
Feb 16
22 min read

What is a Health Group?

A Health Group is a configuration object used to group specific PBR destination interfaces—typically the consumer and provider interfaces of the same service node (such as a firewall or load balancer)—into a single logical unit for health tracking.

How is it Useful?

The primary purpose of a Health Group is to prevent traffic black-holing when a service node experiences a partial failure.

1. Prevents Traffic Black-Holing In a typical PBR deployment, a service device (like a firewall) has two interfaces connected to the fabric: a consumer connector and a provider connector. If only one of these interfaces goes down (for example, the consumer-side link fails), the other interface might remain "up" from the fabric's perspective.

Without a Health Group: The fabric might continue redirecting return traffic to the still-active provider interface. However, because the consumer interface is down, the service device cannot forward the traffic properly, causing packets to be dropped (black-holed).
With a Health Group: If any interface in the group goes down, the fabric automatically marks the entire node (both interfaces) as down. This ensures that PBR stops redirecting traffic to this node in both directions.

2. Enables Failover for Devices Lacking Link Redundancy Some Layer 4-7 devices have an internal feature to automatically bring down one interface if the other fails (link state tracking). For devices that do not have this capability, configuring a Health Group is essential. It forces the fabric to disable the specific node immediately upon partial failure, allowing the hashing algorithm to redirect traffic to a different, healthy service node.

Summary of Behaviour

Configuration: You assign the consumer IP and the provider IP of a single service node to the same Health Group name (e.g., Health-group1).
Action: If the tracking mechanism (such as IP SLA) detects that the consumer IP is unreachable, the Health Group triggers a status change that disables the provider IP as well.
Result: Traffic is seamlessly re-hashed to a backup or alternative active node, maintaining service availability.

steps to configure a Health Group (specifically an L4-L7 Redirect Health Group) in the Cisco APIC GUI.

Step 1: Create the Redirect Health Group

This step defines the group object itself.

On the menu bar, navigate to Tenants > [Your Tenant Name].
In the Navigation pane, expand Policies > Protocol > L4-L7 Redirect Health Groups.
Right-click L4-L7 Redirect Health Groups and choose Create L4-L7 Redirect Health Group.
In the dialog box:
- Name: Enter a unique name for the health group.
- Description: (Optional) Enter a description.
Click Submit.

Step 2: Associate the Health Group with PBR Destinations

A Health Group is only effective when applied to specific destinations within a Policy-Based Redirect (PBR) policy. This groups the interfaces (typically consumer and provider) so they track together.

Navigate to Tenants > [Your Tenant Name] > Policies > Protocol > L4-L7 Policy Based Redirect.
Right-click L4-L7 Policy Based Redirect and choose Create L4-L7 Policy Based Redirect (or double-click an existing policy to edit it).
In the policy dialog box, locate the Destinations section (L3 Destinations or L1/L2 Destinations) and click the + (plus) icon to add a destination.
In the Create Destination of Redirected Traffic dialog:
- Enter the IP and MAC address of the service node interface.
- In the Redirect Health Group dropdown menu, select the health group you created in Step 1,.
Click OK and then Submit.

Note: To ensure proper tracking functionality, you must assign the same Redirect Health Group to both the consumer and provider interfaces of the same service node when configuring their respective PBR policies

How is IP SLA Useful?

IP SLA (Service Level Agreement) monitoring is the mechanism Cisco ACI uses to track the health and availability of service nodes (such as firewalls or load balancers) configured as PBR destinations.

Its primary benefits include:

Preventing Traffic Black-holing: Without tracking, the ACI fabric blindly redirects traffic to the configured service node IP. If that node goes offline or its interface fails, traffic is dropped (black-holed). IP SLA probes the node; if the node fails to respond, the fabric marks it as "down" and stops redirecting traffic to it.
Enabling Failover and Resilient Hashing: When a node is marked down by IP SLA, the fabric can re-hash existing traffic flows to other available service nodes in the cluster or to a configured backup node. This is essential for High Availability (HA) designs (Active/Standby or Active/Active).
Mandatory for Specific Deployments: IP SLA tracking is mandatory if you are configuring a PBR destination that resides in an L3Out (external routed network). It is also required if you are using Dynamic MAC Address Detection, where the fabric learns the service node's MAC via ARP rather than static configuration.
Granular Health Checks: Unlike simple link status, IP SLA allows for protocol-specific health checks. You can check if a specific TCP port (e.g., port 80) is open or even perform an HTTP GET request to ensure the application layer is responsive, rather than just pinging the interface.

How to Configure IP SLA

Configuring IP SLA involves two main steps:

1. Create the IP SLA Monitoring Policy

You perform this configuration under the specific Tenant where your service graph is deployed.

GUI Path: Navigate to Tenants > [Tenant Name] > Policies > Protocol > IP SLA > IP SLA Monitoring Policies.
Action: Right-click and select Create IP SLA Monitoring Policy.
Key Parameters:
- Name: Give the policy a descriptive name (e.g., HTTP-probe).
- SLA Type: Choose the protocol used to probe the device. Options include:
  - ICMP: Standard ping.
  - TCP: Checks connectivity to a specific destination port.
  - L2Ping: Required for L1/L2 PBR deployments (uses EtherType 0x0721).
  - HTTP: Performs an HTTP GET request. Requires specifying the URI (must start with /), version (1.0 or 1.1), and method.
- Frequency: How often the probe is sent (default is often 60 seconds; minimum is 5 seconds for HTTP and 1 second for others).
- Detect Multiplier: The number of missed probes before the node is marked down (default is 3).

2. Associate with the PBR Policy

Once the monitoring policy is created, you must apply it to the PBR destination configuration.

GUI Path: Navigate to Tenants > [Tenant Name] > Policies > Protocol > L4-L7 Policy Based Redirect.
Action: Select your existing PBR policy or create a new one.
Configuration: In the PBR policy dialog/window, locate the IP SLA Monitoring Policy dropdown menu and select the policy you created in step 1.
Redirect Health Group: For robust tracking, it is highly recommended (and sometimes required) to assign the PBR destinations (both consumer and provider interfaces) to a Redirect Health Group. This ensures that if the IP SLA probe fails on one side (e.g., the consumer interface), the fabric marks the entire node as down and stops redirecting traffic in both directions.

Deployment modes for connecting a firewall to the Cisco ACI fabric.

There are four primary scenarios (deployment modes) for connecting a firewall to the Cisco ACI fabric. These scenarios dictate how traffic is forwarded through the firewall and how the network is logically constructed.

1. Transparent Mode (Go-Through)

In this scenario, the firewall acts as a Layer 2 bridge ("bump in the wire"). It connects two different Bridge Domains (BDs) but does not route traffic between them.

Topology: The firewall bridges a Consumer BD (Client side) and a Provider BD (Server side).
Routing: The firewall is not the default gateway for the servers. The default gateway is typically the ACI fabric (SVI on the Consumer BD) or an external router connected to the Consumer BD.
ACI Configuration:
- Requires two Bridge Domains.
- ACI automatically configures the Bridge Domains to enable Unknown Unicast Flooding and ARP Flooding to ensure Layer 2 connectivity works across the firewall.
- Routing must be disabled on at least one of the two Bridge Domains (usually the provider side) to prevent loops, as the firewall bridges them.

2. Routed Mode (Go-To)

In this scenario, the firewall acts as a Layer 3 hop. It routes traffic between the consumer and provider networks and often serves as the default gateway for the servers.

Topology: The firewall connects a Consumer BD and a Provider BD.
Routing: The firewall is the default gateway for the backend servers. Traffic is routed from the ACI fabric to the firewall, and then from the firewall to the servers.
Sub-scenarios:
- With NAT: The firewall translates the server IP addresses. The ACI fabric routes traffic to the firewall's external interface (L3Out or BD), and the firewall routes it internally.
- Without NAT: The firewall routes traffic without translation. You must configure static routes or dynamic routing (OSPF/BGP) between the ACI fabric (via L3Out) and the firewall so the fabric knows how to reach the servers behind the firewall.

3. Policy-Based Redirect (PBR)

This is the most flexible and modern method. The ACI fabric "redirects" specific traffic flows to the firewall based on policy (Service Graph), regardless of the destination IP or routing table.

Topology: The firewall does not need to be the default gateway. It can be placed in a dedicated "Service Bridge Domain" or sit in the same BD as the endpoints.
Advantages:
- You do not need to span VLANs or configure complex VRF stitching.
- The firewall is inserted transparently; removing it does not require changing the server's default gateway.
- Supports Symmetric PBR, allowing you to scale out with a pool of active firewalls.
Supported Device Types:
- L3 PBR: The firewall operates in routed mode.
- L1/L2 PBR: As of newer releases, ACI can redirect traffic to a transparent firewall (L1/L2 mode) without requiring it to be in the data path for all traffic.

4. One-Arm Mode

The firewall connects to the fabric using a single interface (or Port Channel) that handles both ingress and egress traffic.

Topology: The firewall sits on a dedicated Bridge Domain.
Requirement: This mode typically requires Source NAT (SNAT) on the firewall to ensure return traffic goes back to the firewall instead of bypassing it to the default gateway.
Use Case: More common for Load Balancers, but applicable to firewalls using PBR for specific east-west traffic inspection.

Summary of Connection Methods (Physical & Logical)

Regardless of the scenario above, you will use one of the following Operational Models to configure the connection:

Operational Model	Description	Who Configures What?
Network Policy Mode (Unmanaged)	ACI manages the network (VLANs, Switches), but you configure the firewall manually.	ACI Admin: Configures Service Graph & BDs Security Admin: Configures IP, ACLs, Routes on Firewall.
Service Policy Mode (Managed)	ACI pushes network and firewall configuration (ACLs, IPs) to the device using a Device Package.	ACI Admin: Configures everything via APIC (Network + Firewall Policy).
Service Manager Mode	ACI integrates with a firewall manager (e.g., FMC, Panorama). ACI handles network; Manager handles security policy.	ACI Admin: Links Service Graph to Policy Security Admin: Defines policy in Firewall Manager.

High Availability (HA) Considerations

Active/Standby: You connect both firewalls to the fabric. ACI views them as a single "Logical Device." You typically use a separate interface for failover (heartbeat) traffic, which can be connected directly between firewalls or through the ACI fabric.
Active/Active (Clustering): Multiple firewalls act as one logical unit. They must connect to the ACI fabric via a vPC (Virtual Port Channel) to prevent MAC flapping, as they share the same MAC/IP addresses.

Two-Arm Mode

"Two-Arm Mode" is not a fifth independent scenario; rather, it is a physical and logical topology description that applies to the Routed (Go-To), Transparent (Go-Through), and Policy-Based Redirect (PBR) scenarios.

Standard Deployment: Two-Arm is the "default" or traditional way to deploy Layer 4-7 services. It physically and logically separates the "Consumer" (Ingress/Outside) traffic from the "Provider" (Egress/Inside) traffic.
Relationship to Scenarios:
- Transparent Mode (Go-Through): This is inherently Two-Arm. The device bridges two different Bridge Domains (BDs). Traffic enters one interface (arm) and exits another,.
- Routed Mode (Go-To): This is typically Two-Arm. The device routes between an outside BD/subnet and an inside BD/subnet. However, Routed Mode can be deployed as One-Arm (router-on-a-stick) if it uses a single interface to route traffic in and out,.
- Policy-Based Redirect (PBR): PBR supports both Two-Arm and One-Arm topologies. In PBR Two-Arm, the consumer connector and provider connector are on different interfaces and typically different subnets/BDs. In PBR One-Arm, traffic is redirected to a single interface, processed, and sent back out the same interface,,.

How to Identify if a Configuration is One-Arm or Two-Arm?

You can identify the mode by inspecting the Device Selection Policy, the L4-L7 Device configuration, and the Network Topology in the Cisco APIC GUI.

1. Check the Device Selection Policy (Logical Interface Mapping)

This is the most definitive check.

One-Arm Mode: The Consumer and Provider logical connectors are both mapped to the same concrete interface on the device (e.g., both mapped to GigabitEthernet0/0 or 1_1).
- Source: For F5 One-Arm, the documentation states you must "Configure the logical interfaces external and internal to point to the same 'internal' interface".
- Source: In One-Arm PBR, the Device Selection Policy shows both Consumer and Provider connectors using the same cluster interface (e.g., one-arm).
Two-Arm Mode: The Consumer connector maps to one interface (e.g., external), and the Provider connector maps to a different interface (e.g., internal).

2. Check the "Function Type" or "Node Type"

When configuring the L4-L7 Device parameters in the APIC GUI, there are explicit buttons for this setting.

Navigate to Tenants > Services > L4-L7 > L4-L7 Devices.
Select your device and look for the Node Type or Function Type buttons.
- You will see options explicitly labeled One-Arm or Two-Arm (or sometimes configured via radio buttons for ADC configurations)

3. Check for Source NAT (SNAT)

While not a configuration setting in APIC itself, the presence of SNAT on the service device is a strong indicator of One-Arm mode.

One-Arm: Requires Source NAT (SNAT) on the load balancer or firewall. This changes the source IP to the device's own IP to ensure return traffic comes back to the single interface instead of bypassing it,.
Two-Arm: Typically preserves the client IP (no SNAT required for routing symmetry) because the device is in the direct path of the return traffic.

4. Check Interface Security Configuration (Firewalls)

If you are inspecting the configuration of a firewall (like a Cisco ASA):

One-Arm: You will see the command same-security-traffic permit intra-interface. This allows traffic to enter and exit the same physical interface, which is blocked by default on many firewalls.
Two-Arm: This command is not required because traffic enters one interface (Outside) and exits another (Inside).

5. Check Bridge Domain (BD) Usage

One-Arm: Often involves a single dedicated Service BD where the device interface resides. Both consumer and provider EPGs redirect traffic to this single subnet/interface,.

• Two-Arm: Typically involves two separate Service BDs (one for the consumer side, one for the provider side) or connects directly to the Consumer BD and Provider BD separately.

Configuration Example

1. One-Arm Mode

Scenario: An F5 Load Balancer deployed in GoTo (Routed) mode. The load balancer uses a single interface to receive traffic from clients and send it to servers. It typically performs Source NAT (SNAT) so traffic returns to it.

Step 1: Create the L4-L7 Device

Navigate to Tenants > Services > L4-L7 > L4-L7 Devices.
Mode: Single Node or HA Cluster.
Function Type: Select GoTo.
Concrete Interfaces: Add one interface (e.g., 1_1 or GigabitEthernet0/0).
Cluster Interfaces: Create a single cluster interface (e.g., name it One-Arm).
- Map the concrete interface (1_1) to this One-Arm cluster interface.

Step 2: Configure the Service Graph Template

Create a template with a single ADC node.
The node will have a Consumer connector and a Provider connector.

Step 3: Configure the Device Selection Policy (The Critical Step) This determines the "One-Arm" behavior by mapping both logical sides to the same physical port.

Navigate to L4-L7 Services > Device Selection Policies.
Consumer Connector:
- Cluster Interface: Select One-Arm (the interface created in Step 1).
- Bridge Domain: Select the dedicated Service BD (e.g., BD-LB-OneArm).
Provider Connector:
- Cluster Interface: Select One-Arm (Select the same interface as above).
- Bridge Domain: Select the same Service BD (BD-LB-OneArm).

Step 4: Network Configuration

Ensure the Service BD (BD-LB-OneArm) has a subnet configured. This subnet acts as the default gateway for the Load Balancer.

2. Two-Arm Mode

Scenario: A Cisco ASA Firewall deployed in GoTo (Routed) mode or PBR. The firewall has distinct "External" and "Internal" interfaces.

Step 1: Create the L4-L7 Device

Navigate to Tenants > Services > L4-L7 > L4-L7 Devices.
Function Type: Select GoTo (or L3 for PBR).
Concrete Interfaces: Add two interfaces:
- Interface 1: GigabitEthernet0/0 (External)
- Interface 2: GigabitEthernet0/1 (Internal)
Cluster Interfaces: Create two distinct cluster interfaces:
- Name: consumer (or external) -> Map to GigabitEthernet0/0.
- Name: provider (or internal) -> Map to GigabitEthernet0/1.

Step 2: Configure the Service Graph Template

Create a template with a Firewall node.
The node has a Consumer connector and a Provider connector.

Step 3: Configure the Device Selection Policy This separates the traffic flows onto different physical ports.

Navigate to L4-L7 Services > Device Selection Policies.
Consumer Connector:
- Cluster Interface: Select external (the consumer-side interface).
- Bridge Domain: Select the Consumer BD (e.g., BD-External) or the Service Consumer BD.
Provider Connector:
- Cluster Interface: Select internal (the provider-side interface).

Bridge Domain: Select the Provider BD (e.g., BD-Internal) or the Service Provider BD.

Summary of Differences

Feature	One-Arm Mode Example	Two-Arm Mode Example
Physical Cabling	Single link (or PortChannel) to the leaf switch.	Two separate links (or separate VLANs on a trunk) to the leaf switch.
L4-L7 Device Config	One Cluster Interface configured (e.g., 1_1).	Two Cluster Interfaces configured (e.g., ext, int).
Device Selection Policy	Consumer and Provider connectors map to the SAME interface.	Consumer and Provider connectors map to DIFFERENT interfaces.
Routing Behavior	Often requires Source NAT (SNAT) or PBR to ensure return traffic symmetry.	Uses standard routing; traffic enters one side and exits the other.

Symmetric PBR [Pool of service nodes (cluster of firewalls /load balancers)]

Symmetric Policy-Based Redirect (PBR) is a feature in Cisco ACI that allows you to provision a pool of service nodes (such as a cluster of firewalls or load balancers) and load-balance traffic across them. Crucially, it ensures that both the incoming traffic (Consumer to Provider) and the return traffic (Provider to Consumer) for a specific flow are redirected to the same specific service node within that pool.

It relies on hashing algorithms (typically using Source IP, Destination IP, and Protocol) to deterministically select which node in the pool handles a specific conversation.

Why is it used?

Symmetric PBR is primarily used to achieve horizontal scaling and High Availability (HA) for Layer 4-7 services while maintaining the integrity of stateful connections.

1. Horizontal Scaling (Scale-Out) Instead of relying on a single, large active/standby pair of firewalls, Symmetric PBR allows you to deploy multiple independent active nodes. Traffic is distributed across these nodes, increasing the total throughput and processing capacity of the service chain.

2. Supporting Stateful Inspection Stateful devices like firewalls must see both directions of a TCP/UDP session to properly track the connection state. If traffic were load-balanced randomly (e.g., request goes to Firewall A, reply goes to Firewall B), Firewall B would drop the packet because it has no record of the initial connection. Symmetric PBR guarantees that if the request goes to Node A, the reply also goes to Node A, ensuring stateful inspection works correctly.

3. High Availability and Resiliency Symmetric PBR integrates with Resilient Hashing and PBR Node Tracking.

Tracking: If a node fails (detected via IP SLA probes), the fabric automatically removes it from the pool.
Resilient Hashing: When a node fails, only the traffic flows associated with that specific failed node are moved to a backup node. Traffic handled by healthy nodes remains undisturbed, preventing widespread connection resets that would occur if the entire traffic pool were re-hashed.

4. Optimised Resource Utilisation (Weight-Based) Starting with ACI Release 6.0(1), Symmetric PBR supports weights. This allows administrators to assign higher traffic loads to more powerful appliances and lower loads to smaller ones within the same cluster, rather than assuming all nodes have equal capacity.

Key Requirements

Hardware: Symmetric PBR generally requires Cisco Nexus 9300-EX or 9300-FX platform leaf switches or later.
Configuration: You must configure the PBR policy with multiple destination IPs (and MACs) and ensure the same weights/hashing parameters are applied to both directions of the traffic flow to maintain symmetry.

How to Configure Symmetric PBR ?

To configure Symmetric PBR, you must create a PBR Policy that contains multiple destination IP addresses (representing your cluster of service nodes) rather than just one. You then associate this policy with your Service Graph.

The Cisco APIC uses a hashing algorithm (typically based on Source IP, Destination IP, and Protocol) to distribute traffic across these destinations. To ensure symmetry (that the return traffic goes to the same node as the incoming traffic), you generally configure a complementary PBR policy for the return traffic direction.

Configuration Steps (GUI)

Navigate to the PBR Policy Menu: Go to Tenants > [Your Tenant] > Policies > Protocol > L4-L7 Policy Based Redirect.
Create a New Policy: Right-click and select Create L4-L7 Policy Based Redirect.
Configure Basic Settings:
- Name: Give it a name (e.g., FW-Cluster-Consumer-Side).
- Hashing Algorithm: Select the algorithm. The default is usually Source IP, Destination IP, and Protocol number, which works well for most symmetric deployments.
- Resilient Hashing: (Recommended) Check this box to minimise flow disruption if a node fails.
Add Multiple Destinations: In the L3 Destinations table, click the + icon to add each service node in your cluster.
- IP: Enter the interface IP of the service node (e.g., Firewall 1).
- MAC: Enter the MAC address (or 00:00:00:00:00:00 if using IP SLA tracking to dynamically resolve it).
- Destination Name (Crucial): Starting from newer APIC releases, you can assign a name (e.g., Node1). This helps ensure the sorting order matches for both incoming and outgoing policies.
- Repeat this step for every node in the cluster (Node2, Node3, etc.).
Submit: Click Submit to save the policy.
Create the Return Policy (For Two-Arm setups): Repeat the steps above to create a second policy (e.g., FW-Cluster-Provider-Side) for the provider-side interfaces of the same nodes. Ensure the order of the nodes matches the first policy to maintain symmetry.
Apply to Service Graph: Go to your Device Selection Policy (under L4-L7 Services) and map these PBR policies to the consumer and provider connectors of your service graph.

Configuration Example

Scenario: You have a cluster of 3 Firewalls (Active/Active) inserted between a Client EPG and a Web EPG. You want traffic to be load-balanced across them.

1. Define the PBR Policy for Incoming Traffic (Consumer -> Provider)

Policy Name: PBR-Incoming
Hashing: Source IP, Destination IP, and Protocol
Destinations:
- Row 1: IP 192.168.1.1 (FW1-External) | MAC 00:50:56:A:B:C1 | Name FW1
- Row 2: IP 192.168.1.2 (FW2-External) | MAC 00:50:56:A:B:C2 | Name FW2
- Row 3: IP 192.168.1.3 (FW3-External) | MAC 00:50:56:A:B:C3 | Name FW3

2. Define the PBR Policy for Return Traffic (Provider -> Consumer)

Policy Name: PBR-Return
Hashing: Source IP, Destination IP, and Protocol
Destinations:
- Note: You must list the internal interfaces of the same firewalls in the same order (or use Destination Name sorting) so the hash calculates the same node.
- Row 1: IP 192.168.2.1 (FW1-Internal) | MAC 00:50:56:X:Y:Z1 | Name FW1
- Row 2: IP 192.168.2.2 (FW2-Internal) | MAC 00:50:56:X:Y:Z2 | Name FW2
- Row 3: IP 192.168.2.3 (FW3-Internal) | MAC 00:50:56:X:Y:Z3 | Name FW3

3. CLI Equivalent (NX-OS Style)

If you were configuring the incoming policy via CLI, it would look like this:

apic1(config-tenant)# svcredir-pol PBR-Incoming

apic1(svcredir-pol)# hashing-algorithm sip-dip-prototype

apic1(svcredir-pol)# redir-dest 192.168.1.1 00:50:56:A:B:C1

apic1(svcredir-pol)# redir-dest 192.168.1.2 00:50:56:A:B:C2

apic1(svcredir-pol)# redir-dest 192.168.1.3 00:50:56:A:B:C3

apic1(svcredir-pol)# exit

Why Order Matters (Symmetry)

If the hashing algorithm sends "Flow A" to the first destination in the list for the incoming traffic, it must also send the return of "Flow A" to the first destination in the return list.

Without Destination Names: The APIC sorts destinations by IP address. You must ensure that FW1-External (lowest IP) corresponds to FW1-Internal (lowest IP) on the other side.
With Destination Names: The APIC sorts by the name you provided (e.g., FW1, FW2). This is safer if your IP addressing scheme doesn't naturally align with your node pairing.

How Resilient Hashing minimises traffic impact during a failover ?

Resilient Hashing minimises traffic impact during a failover by ensuring that only the traffic flows associated with the failed node are moved, while all other traffic remains on its original path.

Here is a detailed breakdown of how it works compared to the default behaviour:

1. The Problem: Default Hashing Behaviour Without Resilient Hashing, Cisco ACI uses standard ECMP (Equal-Cost Multi-Path) hashing to distribute traffic across the pool of service nodes. If a node fails (or is added), the hashing algorithm recalculates the distribution for the entire pool.

Consequence: Traffic flows that were being handled by healthy nodes might be "rehashed" and moved to different nodes.
Impact: If the service devices are stateful (like firewalls), the new node will not have the connection state information for these existing flows. This results in the new node dropping the packets, causing connection resets for users who were not even using the failed device.

2. The Solution: Resilient Hashing Resilient Hashing changes this behaviour by "pinning" or mapping traffic flows to specific physical nodes more statically.

During Failover: When a node fails, the fabric detects the failure (via IP SLA tracking). Resilient Hashing ensures that only the traffic flows destined for the failed node are remapped.
Stability: Existing traffic flows currently being processed by the remaining healthy nodes are not moved. They continue to flow through their original service nodes without interruption,,,.
Result: This minimises the "blast radius" of a failure, ensuring that a single device failure does not disrupt the entire network's active sessions

3. Integration with Backup Policies Resilient Hashing is often combined with a PBR Backup Policy (available on EX/FX switches and later).

Instead of redistributing the failed node's traffic to the remaining active nodes (which could cause congestion or overloading), the system can redirect the specific traffic from the failed node to a designated backup node.
This ensures N+M high availability where the backup node absorbs the load of the failed node exclusively

Hardware Requirement: It is important to note that Resilient Hashing requires Cisco Nexus 9300-EX or 9300-FX platform leaf switches or later to function.

What is a PBR Backup Policy?

A PBR Backup Policy is a high-availability feature introduced in Cisco APIC Release 4.2(1) that enables N+M redundancy for service nodes. It is designed to work in conjunction with Resilient Hashing to handle node failures without overloading the remaining active nodes.

The Problem it Solves: By default, if a service node fails, Resilient Hashing redistributes that node's traffic to the remaining active nodes. This can cause the remaining nodes to become overloaded (e.g., doubling their traffic load).
The Solution: Instead of shifting the failed node's traffic to other active nodes, the PBR Backup Policy redirects that specific traffic to a designated standby (backup) node.
Behavior:
- Normal Operation: Backup nodes sit idle and do not process traffic.
- Failure Scenario: If "Primary Node A" fails, its traffic is moved to "Backup Node 1." Traffic flowing through "Primary Node B" remains unaffected.
- Selection Logic: If multiple backup nodes are available, the system selects one based on the order of IP addresses (lowest to highest) or Destination Name (if configured). If all backup nodes are exhausted, traffic will then fall back to the remaining primary nodes.

Prerequisites and Limitations

Hardware: Supported only on Cisco Nexus 9300-EX, -FX, -FX2, and later platform leaf switches.
Resilient Hashing: Must be enabled on the main PBR policy for the backup policy to function.
Exclusivity: A specific destination IP cannot be used as both a primary PBR destination and a backup PBR destination.
L1/L2 Support: Supported for Layer 1/Layer 2 PBR starting from Cisco APIC Release 5.0(1).

Configuration Steps:

Configuring a PBR Backup Policy involves two main stages: defining the backup destinations in a specific backup policy, and then associating that backup policy with your main PBR policy.

Step 1: Create the PBR Backup Policy

Navigate to Tenants > [Your Tenant] > Policies > Protocol > L4-L7 Policy Based Redirect Backup.
Right-click and select Create L4-L7 Policy-Based Redirect Backup.
In the dialog box:
- Name: Enter a name for the backup policy (e.g., Firewall-Backup-Pool).
- L3 Destinations: Click the + icon to add your backup service nodes.
- IP & MAC: Enter the IP address and MAC address of the backup node.
- Redirect Health Group: Select or create a Health Group to monitor the backup node's status.
Click Submit.

Step 2: Associate with the Main PBR Policy

Navigate to Tenants > [Your Tenant] > Policies > Protocol > L4-L7 Policy Based Redirect.
Select the primary PBR policy (the one containing your active service nodes).
In the main work pane, configure the following:
- Resilient Hashing Enabled: Check this box (Required).
- Backup Policy: From the dropdown menu, select the backup policy you created in Step 1 (e.g., Firewall-Backup-Pool).
Ensure your primary destinations are listed in the L3 Destinations table and have Redirect Health Groups assigned.
Click Submit.

Once configured, if a primary node in the main PBR policy fails (detected via the Health Group/IP SLA), the switch will redirect the affected flows to the node defined in the Backup Policy.

How to use L3Out PBR to handle both North-South (N-S) and East-West (E-W) traffic using the same service device (e.g., a firewall) ?

Starting with Cisco ACI Release 5.2, the ability to configure a PBR destination in an L3Out enables this consolidated design.

How it Works

In this design, the firewall is typically connected to the fabric via an L3Out (internal leg) and to the external network (external leg) outside of the ACI fabric.

North-South Traffic (Standard Routing):
- Traffic flowing between internal EPGs and the external network (WAN/Internet) follows standard routing paths.
- You configure a standard contract (e.g., Contract2 with "Permit") between your internal EPGs and the L3Out EPG.
- The firewall acts as the natural gateway or next hop for this traffic without needing PBR.
East-West Traffic (PBR Insertion):
- Traffic flowing between two internal EPGs (e.g., EPG1 and EPG2) is redirected to the same firewall interface used for N-S traffic.
- You configure a PBR Service Graph on the contract between EPG1 and EPG2 (e.g., Contract1).
- The PBR policy redirects this internal traffic to the firewall's IP address defined in the L3Out.

Benefits

Consolidation: You can reuse your perimeter (North-South) firewall for internal (East-West) inspection, eliminating the need for separate dedicated firewalls for internal segmentation.
Simplified Topology: The firewall does not need to be the default gateway for the internal servers to inspect East-West traffic; PBR handles the redirection logic.

Key Requirements

Software Version: Requires Cisco APIC Release 5.2(1) or later.
Mandatory Tracking: IP SLA Tracking is mandatory for PBR destinations configured in an L3Out to ensure traffic is not black-holed if the external reachability fails.
Subnet Configuration: You cannot use 0.0.0.0/0 (default route) for the L3Out EPG used as the PBR destination. You must use specific subnets (e.g., 0.0.0.0/1 and 128.0.0.0/1) to ensure proper EPG classification for the redirected traffic.

How L3Out PBR Subnets Affect EPG Classification in ACI?

L3Out subnets play a critical role in how the ACI fabric classifies traffic when Policy-Based Redirect (PBR) is used. The subnet configuration determines whether traffic returning from a service node is treated as:

Service return traffic (part of a PBR/service chain), or
Regular external traffic (normal L3Out traffic)

Incorrect subnet configuration can break service insertion and cause traffic drops.

Service EPG vs. L3Out EPG Classification

When an L3Out is used as a PBR destination (commonly in two-arm service designs), ACI internally creates a hidden Service EPG associated with the service connector.

This hidden Service EPG is used to:

Recognize redirected traffic returning from the service node
Apply correct zoning rules
Permit traffic back toward the provider EPG
Maintain service chain consistency

Correct Classification (Using Specific Subnets)

If the returning traffic matches the specific subnet configured in the L3Out EPG for the PBR destination:

The fabric correctly classifies the traffic into the hidden Service EPG.
Zoning rules allow it back to the provider side.
The PBR/service chain operates as expected.

This is the intended behavior.

Incorrect Classification (Using 0.0.0.0/0)

If the L3Out EPG is configured with:

0.0.0.0/0

then:

All returning traffic matches the generic external L3Out EPG.
The traffic is not classified into the hidden Service EPG.
Service graph zoning rules do not apply.
The service chain breaks.
Traffic may be dropped or forwarded incorrectly.

This happens because classification inside ACI is based on:

VRF
Longest prefix match
L3Out External EPG subnet definitions

Using 0.0.0.0/0 causes the fabric to treat all traffic as generic external traffic rather than service-return traffic.

The 0.0.0.0/0 Restriction

For L3Out EPGs used in PBR/service insertion, 0.0.0.0/0 cannot be used.

Why?

Because:

The default route matches all prefixes.
It overrides proper service-return classification.
It prevents traffic from being associated with the hidden Service EPG.
PBR zoning rules are bypassed.

This restriction also applies to IPv6:

::/0

The Workaround: Splitting the Default Route

To cover all IP space without using the restricted default route, split it into two non-overlapping subnets.

For IPv4:

0.0.0.0/1

128.0.0.0/1

For IPv6:

::/1

8000::/1

Why This Works

These prefixes together cover the entire address space.
They avoid using the exact default route object.
They preserve proper Service EPG classification.
PBR/service graph logic functions correctly.

This is the recommended production workaround.

Overlapping L3Out Subnets and Misclassification Risks

When multiple L3Outs exist in the same VRF:

Traffic classification is based on longest prefix match.
It is not solely based on ingress interface.

If:

A PBR L3Out EPG subnet is overly broad, or
It overlaps with another L3Out EPG subnet

Traffic may be classified into the wrong External EPG.

This can cause:

Unexpected policy drops
Incorrect PBR behavior
Asymmetric routing

Careful subnet design is required to avoid overlap.

Two-Arm Mode and Default Route Learning

In two-arm service designs:

If the firewall learns 0.0.0.0/0 dynamically (via OSPF or BGP):

Return path symmetry must be preserved.
Both L3Out connectors should ideally be on the same leaf pair (vPC).
Or routing must be designed to ensure consistent classification.

If connectors are spread across unrelated leaf switches and default route resolution differs, the fabric may not resolve classification correctly, leading to service chain failures.

This is a design best practice, not a hard architectural limitation.

Intra-External EPG Contracts

If you attempt to configure an Intra-External EPG contract using 0.0.0.0/0:

APIC will raise a fault.
The same split-prefix workaround applies.

Summary

L3Out PBR subnets determine how returning service traffic is classified inside the VRF. If 0.0.0.0/0 (or ::/0) is configured in the L3Out EPG used for a PBR destination, returning East-West traffic is classified into the generic external EPG instead of the hidden Service EPG. This breaks service chain zoning logic and may result in traffic drops.

To avoid this, the default route must be split into two specific prefixes (0.0.0.0/1 and 128.0.0.0/1 for IPv4; ::/1 and 8000::/1 for IPv6).

Careful subnet design is required to avoid overlap or misclassification when multiple L3Outs exist in the same VRF.