VxLAN factory. Part 1

Hello habr. I am currently the course leader for "Network Engineer" at OTUS.
In anticipation of the start of a new enrollment for the course "Network engineer", I have prepared a series of articles on VxLAN EVPN technology.

There is a huge amount of material on the operation of VxLAN EVPN, so I want to collect various tasks and practices for solving problems in a modern data center.

VxLAN factory. Part 1

In the first part of the VxLAN EVPN technology cycle, I want to consider a way to organize L2 connectivity between hosts on top of a network factory.

All examples will be performed on Cisco Nexus 9000v, assembled in the Spine-Leaf topology. We will not dwell on setting up the Underlay network in this article.

  1. underlay network
  2. BGP peering for address-family l2vpn evpn
  3. NVE setup
  4. Suppress-arp

underlay network

The topology used is as follows:

VxLAN factory. Part 1

Let's set addressing on all devices:

Spine-1 - 10.255.1.101
Spine-2 - 10.255.1.102

Leaf-11 - 10.255.1.11
Leaf-12 - 10.255.1.12
Leaf-21 - 10.255.1.21

Host-1 - 192.168.10.10
Host-2 - 192.168.10.20

Let's check that there is IP connectivity between all devices:

Leaf21# sh ip route
<........>
10.255.1.11/32, ubest/mbest: 2/0                      ! Leaf-11 доступен чеерз два Spine
    *via 10.255.1.101, Eth1/4, [110/81], 00:00:03, ospf-UNDERLAY, intra
    *via 10.255.1.102, Eth1/3, [110/81], 00:00:03, ospf-UNDERLAY, intra
10.255.1.12/32, ubest/mbest: 2/0                      ! Leaf-12 доступен чеерз два Spine
    *via 10.255.1.101, Eth1/4, [110/81], 00:00:03, ospf-UNDERLAY, intra
    *via 10.255.1.102, Eth1/3, [110/81], 00:00:03, ospf-UNDERLAY, intra
10.255.1.21/32, ubest/mbest: 2/0, attached
    *via 10.255.1.22, Lo0, [0/0], 00:02:20, local
    *via 10.255.1.22, Lo0, [0/0], 00:02:20, direct
10.255.1.101/32, ubest/mbest: 1/0
    *via 10.255.1.101, Eth1/4, [110/41], 00:00:06, ospf-UNDERLAY, intra
10.255.1.102/32, ubest/mbest: 1/0
    *via 10.255.1.102, Eth1/3, [110/41], 00:00:03, ospf-UNDERLAY, intra

Let's check that the VPC domain has been created and both switches have passed the consistency check and the settings on both nodes are identical:

Leaf11# show vpc 

vPC domain id                     : 1
Peer status                       : peer adjacency formed ok
vPC keep-alive status             : peer is alive
Configuration consistency status  : success
Per-vlan consistency status       : success
Type-2 consistency status         : success
vPC role                          : primary
Number of vPCs configured         : 0
Peer Gateway                      : Disabled
Dual-active excluded VLANs        : -
Graceful Consistency Check        : Enabled
Auto-recovery status              : Disabled
Delay-restore status              : Timer is off.(timeout = 30s)
Delay-restore SVI status          : Timer is off.(timeout = 10s)
Operational Layer3 Peer-router    : Disabled

vPC status
----------------------------------------------------------------------------
Id    Port          Status Consistency Reason                Active vlans
--    ------------  ------ ----------- ------                ---------------
5     Po5           up     success     success               1

BGP peering

Finally, we can move on to configuring the Overlay network.

As part of the article, it is necessary to organize a network between hosts, as shown in the diagram below:

VxLAN factory. Part 1

To configure an Overlay network, you need to enable BGP on the Spine and Leaf switches with support for the l2vpn evpn family:

feature bgp
nv overlay evpn

Next, you need to configure BGP peering between Leaf and Spine. To simplify the configuration and optimize the distribution of routing information, we configure Spine as a Route-Reflector server. We will write all Leaf in the config through templates in order to optimize the setting.

So the settings on Spine looks like this:

router bgp 65001
  template peer LEAF 
    remote-as 65001
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
  neighbor 10.255.1.11
    inherit peer LEAF
  neighbor 10.255.1.12
    inherit peer LEAF
  neighbor 10.255.1.21
    inherit peer LEAF

The setup on the Leaf switch looks similar:

router bgp 65001
  template peer SPINE
    remote-as 65001
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended
  neighbor 10.255.1.101
    inherit peer SPINE
  neighbor 10.255.1.102
    inherit peer SPINE

On Spine, check peering with all Leaf switches:

Spine1# sh bgp l2vpn evpn summary
<.....>
Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.255.1.11     4 65001       7       8        6    0    0 00:01:45 0
10.255.1.12     4 65001       7       7        6    0    0 00:01:16 0
10.255.1.21     4 65001       7       7        6    0    0 00:01:01 0

As you can see, there were no problems with BGP. Let's move on to setting up VxLAN. Further configuration will be performed only on the side of the Leaf switches. Spine acts only as the core of the network and is only involved in the transmission of traffic. All work on encapsulation and path definition occurs only on Leaf switches.

NVE setup

NVE - network virtual interface

Before starting the setup, let's introduce some terminology:

VTEP - Vitual Tunnel End Point, the device on which the VxLAN tunnel begins or ends. VTEP is not necessarily any network device. A server supporting VxLAN technology can also act. In our topology, all Leaf switches are VTEPs.

VNI - Virtual Network Index - network identifier within VxLAN. You can draw an analogy with VLAN. However, there are some differences. When using a fabric, VLANs become unique only within one Leaf switch and are not transmitted over the network. But each VLAN can be associated with a VNI number that is already transmitted over the network. What it looks like and how it can be used will be discussed below.

Enable the feature for VxLAN technology to work and the ability to associate VLAN numbers with a VNI number:

feature nv overlay
feature vn-segment-vlan-based

Let's configure the NVE interface, which is responsible for the operation of VxLAN. This interface is responsible for encapsulating frames in VxLAN headers. You can draw an analogy with the Tunnel interface for GRE:

interface nve1
  no shutdown
  host-reachability protocol bgp ! используем BGP для передачи маршрутной информации
  source-interface loopback0    ! интерфейс  с которого отправляем пакеты loopback0

On the Leaf-21 switch, everything is created without problems. However, if we check the output of the command show nve peers, then it will be empty. Here you need to return to the VPC setup. We see that Leaf-11 and Leaf-12 are paired and united by a VPC domain. This results in the following situation:

Host-2 sends one frame to Leaf-21 to be transmitted over the network to Host-1. However, Leaf-21 sees that Host-1's MAC address is available via two VTEPs at once. What should Leaf-21 do in this case? After all, this means that a loop could appear in the network.

To solve this situation, we need Leaf-11 and Leaf-12 to also act as one device within the factory. It is solved quite simply. On the Loopback interface from which we are building the tunnel, add the secondary address. Secondary address must be the same on both VTEPs.

interface loopback0
 ip add 10.255.1.10/32 secondary

Thus, from the point of view of other VTEPs, we get the following topology:

VxLAN factory. Part 1

That is, now the tunnel will be built between the IP address of Leaf-21 and the virtual IP between two Leaf-11 and Leaf-12. Now there will be no problems with learning the MAC address from two devices, and traffic can be transferred from one VTEP to another. Which of the two VTEPs will process the traffic is decided using the routing table on Spine:

Spine1# sh ip route
<.....>
10.255.1.10/32, ubest/mbest: 2/0
    *via 10.255.1.11, Eth1/1, [110/41], 1d01h, ospf-UNDERLAY, intra
    *via 10.255.1.12, Eth1/2, [110/41], 1d01h, ospf-UNDERLAY, intra
10.255.1.11/32, ubest/mbest: 1/0
    *via 10.255.1.11, Eth1/1, [110/41], 1d22h, ospf-UNDERLAY, intra
10.255.1.12/32, ubest/mbest: 1/0
    *via 10.255.1.12, Eth1/2, [110/41], 1d01h, ospf-UNDERLAY, intra

As you can see above, the address 10.255.1.10 is available immediately through two Next-hops.

At this stage, we figured out the basic connectivity. Let's move on to setting up the NVE interface:
We will immediately enable Vlan 10 and associate it with VNI 10000 on each Leaf for hosts. Set up an L2 tunnel between hosts

vlan 10                 ! Включаем VLAN на всех VTEP подключенных к необходимым хостам
  vn-segment 10000      ! Ассоциируем VLAN с номер VNI 

interface nve1
  member vni 10000      ! Добавляем VNI 10000 для работы через интерфейс NVE. для инкапсуляции в VxLAN
    ingress-replication protocol bgp    ! указываем, что для распространения информации о хосте используем BGP

Now let's check nve peers and table for BGP EVPN:

Leaf21# sh nve peers
Interface Peer-IP          State LearnType Uptime   Router-Mac
--------- ---------------  ----- --------- -------- -----------------
nve1      10.255.1.10      Up    CP        00:00:41 n/a                 ! Видим что peer доступен с secondary адреса

Leaf11# sh bgp l2vpn evpn

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 10.255.1.11:32777    (L2VNI 10000)        ! От кого именно пришел этот l2VNI
*>l[3]:[0]:[32]:[10.255.1.10]/88                                   ! EVPN route-type 3 - показывает нашего соседа, который так же знает об l2VNI10000
                      10.255.1.10                       100      32768 i
*>i[3]:[0]:[32]:[10.255.1.20]/88
                      10.255.1.20                       100          0 i
* i                   10.255.1.20                       100          0 i

Route Distinguisher: 10.255.1.21:32777
* i[3]:[0]:[32]:[10.255.1.20]/88
                      10.255.1.20                       100          0 i
*>i                   10.255.1.20                       100          0 i

Above we see routes only EVPN route-type 3. This type of routes talks about peer (Leaf), but where are our hosts?
And the thing is that information about MAC hosts is transmitted via EVPN route-type 2

In order to see our hosts, you need to configure EVPN route-type 2:

evpn
  vni 10000 l2
    route-target import auto   ! в рамках данной статьи используем автоматический номер для route-target
    route-target export auto

Let's ping Host-2 to Host-1:

Firewall2# ping 192.168.10.1
PING 192.168.10.1 (192.168.10.1): 56 data bytes
36 bytes from 192.168.10.2: Destination Host Unreachable
Request 0 timed out
64 bytes from 192.168.10.1: icmp_seq=1 ttl=254 time=215.555 ms
64 bytes from 192.168.10.1: icmp_seq=2 ttl=254 time=38.756 ms
64 bytes from 192.168.10.1: icmp_seq=3 ttl=254 time=42.484 ms
64 bytes from 192.168.10.1: icmp_seq=4 ttl=254 time=40.983 ms

And below we can see that route-type 2 appeared in the BGP table with the MAC address of the hosts - 5001.0007.0007 and 5001.0008.0007

Leaf11# sh bgp l2vpn evpn
<......>

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 10.255.1.11:32777    (L2VNI 10000)
*>l[2]:[0]:[0]:[48]:[5001.0007.0007]:[0]:[0.0.0.0]/216                      !  evpn route-type 2 и mac адрес хоста 1
                      10.255.1.10                       100      32768 i
*>i[2]:[0]:[0]:[48]:[5001.0008.0007]:[0]:[0.0.0.0]/216                      ! evpn route-type 2 и mac адрес хоста 2
* i                   10.255.1.20                       100          0 i
*>l[3]:[0]:[32]:[10.255.1.10]/88
                      10.255.1.10                       100      32768 i
Route Distinguisher: 10.255.1.21:32777
* i[2]:[0]:[0]:[48]:[5001.0008.0007]:[0]:[0.0.0.0]/216
                      10.255.1.20                       100          0 i
*>i                   10.255.1.20                       100          0 i

Next, you can see detailed information on Update, in which you received information about the MAC Host. Below is not the entire output of the command

Leaf21# sh bgp l2vpn evpn 5001.0007.0007

BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 10.255.1.11:32777        !  отправил Update с MAC Host. Не виртуальный адрес VPC, а адрес Leaf
BGP routing table entry for [2]:[0]:[0]:[48]:[5001.0007.0007]:[0]:[0.0.0.0]/216,
 version 1507
Paths: (2 available, best #2)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not i
n HW

  Path type: internal, path is valid, not best reason: Neighbor Address, no labe
led nexthop
  AS-Path: NONE, path sourced internal to AS
    10.255.1.10 (metric 81) from 10.255.1.102 (10.255.1.102)    ! с кем именно строим VxLAN тоннель
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000         ! Номер VNI, который ассоциирован с VLAN, в котором находится Host
      Extcommunity: RT:65001:10000 SOO:10.255.1.10:0 ENCAP:8        ! Тут видно, что RT сформировался автоматически на основе номеров AS и VNI
      Originator: 10.255.1.11 Cluster list: 10.255.1.102
<........>

Let's see what the frames look like when they are passed through the factory:

VxLAN factory. Part 1

Suppress-ARP

Great, we have an L2 connection between hosts and this could be the end of it. However, not all so simple. As long as we have few hosts, there will be no problems. But let's imagine situations that we have hundreds and thousands of hosts. What problem can we face?

This problem is BUM(Broadcast, Unknown Unicast, Multicast) traffic. In the framework of this article, we will consider the option of combating broadcast traffic.
The main Broadcast generator in Ethernet networks is the hosts themselves via the ARP protocol.

Nexus implements the following mechanism for dealing with ARP requests - suppress-arp.
This feature works like this:

  1. Host-1 sends an APR request to the Broadcast address of its network.
  2. The request reaches the Leaf switch and instead of passing this request further to the factory towards Host-2, the Leaf answers itself and indicates the desired IP and MAC.

Thus, the Broadcast request did not go to the factory. But how can this work if Leaf only knows the MAC address?

Everything is quite simple, EVPN route-type 2, in addition to the MAC address, can transmit a MAC / IP bundle. To do this, the Leaf must be configured with an IP address in the VLAN. The question arises, what IP to ask? On nexus, it is possible to create a distributed (same) address on all switches:

feature interface-vlan

fabric forwarding anycast-gateway-mac 0001.0001.0001    ! задаем virtual mac для создания распределенного шлюза между всеми коммутаторами

interface Vlan10
  no shutdown
  ip address 192.168.10.254/24          ! на всех Leaf задаем одинаковый IP
  fabric forwarding mode anycast-gateway    ! говорим использовать Virtual mac

Thus, from the point of view of hosts, the network will look like this:

VxLAN factory. Part 1

Check BGP l2route evpn

Leaf11# sh bgp l2vpn evpn
<......>

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 10.255.1.11:32777    (L2VNI 10000)
*>l[2]:[0]:[0]:[48]:[5001.0007.0007]:[0]:[0.0.0.0]/216
                      10.255.1.21                       100      32768 i
*>i[2]:[0]:[0]:[48]:[5001.0008.0007]:[0]:[0.0.0.0]/216
                      10.255.1.10                       100          0 i
* i                   10.255.1.10                       100          0 i
* i[2]:[0]:[0]:[48]:[5001.0008.0007]:[32]:[192.168.10.20]/248
                      10.255.1.10                       100          0 i
*>i                   10.255.1.10                       100          0 i

<......>

Route Distinguisher: 10.255.1.21:32777
* i[2]:[0]:[0]:[48]:[5001.0008.0007]:[0]:[0.0.0.0]/216
                      10.255.1.20                       100          0 i
*>i                   10.255.1.20                       100          0 i
* i[2]:[0]:[0]:[48]:[5001.0008.0007]:[32]:[192.168.10.20]/248
*>i                   10.255.1.20                       100          0 i

<......>

From the output of the command, it can be seen that in EVPN route-type 2, in addition to the MAC, we now also see the IP address of the host.

Let's return to the suppress-arp setting. This setting is enabled for each VNI separately:

interface nve1
  member vni 10000   
    suppress-arp

Then there is some difficulty:

  • This feature requires space in TCAM memory. I will give an example of setting for suppress-arp:

hardware access-list tcam region arp-ether 256

This setup will require double-wide. That is, if you set 256, then 512 must be released in TCAM. Setting up TCAM is beyond the scope of this article, since setting up TCAM depends only on the task assigned to you and may differ from one network to another.

  • The implementation of suppress-arp must be done on all Leaf switches. However, complexity can arise when configuring on Leaf pairs located in a VPC domain. When changing TCAM, the consistency between the pairs will be broken and one node may be taken out of service. Additionally, a device reboot may be required to apply the TCAM change setting.

As a result, you should carefully consider whether it is worth implementing this setting on a working factory in your situation.

This concludes the first part of the cycle. In the next part, we will consider routing through a VxLAN factory with network separation across different VRFs.

And now I invite everyone to free webinar, in which I will talk in detail about the course. The first 20 participants who register for this webinar will receive a Discount Certificate by email within 1-2 days after the broadcast.

Source: habr.com

Add a comment