Hello habr. I am currently the course leader for "Network Engineer" at OTUS.
In anticipation of the start of a new enrollment for the course
There is a huge amount of material on the operation of VxLAN EVPN, so I want to collect various tasks and practices for solving problems in a modern data center.
In the first part of the VxLAN EVPN technology cycle, I want to consider a way to organize L2 connectivity between hosts on top of a network factory.
All examples will be performed on Cisco Nexus 9000v, assembled in the Spine-Leaf topology. We will not dwell on setting up the Underlay network in this article.
- underlay network
- BGP peering for address-family l2vpn evpn
- NVE setup
- Suppress-arp
underlay network
The topology used is as follows:
Let's set addressing on all devices:
Spine-1 - 10.255.1.101
Spine-2 - 10.255.1.102
Leaf-11 - 10.255.1.11
Leaf-12 - 10.255.1.12
Leaf-21 - 10.255.1.21
Host-1 - 192.168.10.10
Host-2 - 192.168.10.20
Let's check that there is IP connectivity between all devices:
Leaf21# sh ip route
<........>
10.255.1.11/32, ubest/mbest: 2/0 ! Leaf-11 доступен чеерз два Spine
*via 10.255.1.101, Eth1/4, [110/81], 00:00:03, ospf-UNDERLAY, intra
*via 10.255.1.102, Eth1/3, [110/81], 00:00:03, ospf-UNDERLAY, intra
10.255.1.12/32, ubest/mbest: 2/0 ! Leaf-12 доступен чеерз два Spine
*via 10.255.1.101, Eth1/4, [110/81], 00:00:03, ospf-UNDERLAY, intra
*via 10.255.1.102, Eth1/3, [110/81], 00:00:03, ospf-UNDERLAY, intra
10.255.1.21/32, ubest/mbest: 2/0, attached
*via 10.255.1.22, Lo0, [0/0], 00:02:20, local
*via 10.255.1.22, Lo0, [0/0], 00:02:20, direct
10.255.1.101/32, ubest/mbest: 1/0
*via 10.255.1.101, Eth1/4, [110/41], 00:00:06, ospf-UNDERLAY, intra
10.255.1.102/32, ubest/mbest: 1/0
*via 10.255.1.102, Eth1/3, [110/41], 00:00:03, ospf-UNDERLAY, intra
Let's check that the VPC domain has been created and both switches have passed the consistency check and the settings on both nodes are identical:
Leaf11# show vpc
vPC domain id : 1
Peer status : peer adjacency formed ok
vPC keep-alive status : peer is alive
Configuration consistency status : success
Per-vlan consistency status : success
Type-2 consistency status : success
vPC role : primary
Number of vPCs configured : 0
Peer Gateway : Disabled
Dual-active excluded VLANs : -
Graceful Consistency Check : Enabled
Auto-recovery status : Disabled
Delay-restore status : Timer is off.(timeout = 30s)
Delay-restore SVI status : Timer is off.(timeout = 10s)
Operational Layer3 Peer-router : Disabled
vPC status
----------------------------------------------------------------------------
Id Port Status Consistency Reason Active vlans
-- ------------ ------ ----------- ------ ---------------
5 Po5 up success success 1
BGP peering
Finally, we can move on to configuring the Overlay network.
As part of the article, it is necessary to organize a network between hosts, as shown in the diagram below:
To configure an Overlay network, you need to enable BGP on the Spine and Leaf switches with support for the l2vpn evpn family:
feature bgp
nv overlay evpn
Next, you need to configure BGP peering between Leaf and Spine. To simplify the configuration and optimize the distribution of routing information, we configure Spine as a Route-Reflector server. We will write all Leaf in the config through templates in order to optimize the setting.
So the settings on Spine looks like this:
router bgp 65001
template peer LEAF
remote-as 65001
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
route-reflector-client
neighbor 10.255.1.11
inherit peer LEAF
neighbor 10.255.1.12
inherit peer LEAF
neighbor 10.255.1.21
inherit peer LEAF
The setup on the Leaf switch looks similar:
router bgp 65001
template peer SPINE
remote-as 65001
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
neighbor 10.255.1.101
inherit peer SPINE
neighbor 10.255.1.102
inherit peer SPINE
On Spine, check peering with all Leaf switches:
Spine1# sh bgp l2vpn evpn summary
<.....>
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.255.1.11 4 65001 7 8 6 0 0 00:01:45 0
10.255.1.12 4 65001 7 7 6 0 0 00:01:16 0
10.255.1.21 4 65001 7 7 6 0 0 00:01:01 0
As you can see, there were no problems with BGP. Let's move on to setting up VxLAN. Further configuration will be performed only on the side of the Leaf switches. Spine acts only as the core of the network and is only involved in the transmission of traffic. All work on encapsulation and path definition occurs only on Leaf switches.
NVE setup
NVE - network virtual interface
Before starting the setup, let's introduce some terminology:
VTEP - Vitual Tunnel End Point, the device on which the VxLAN tunnel begins or ends. VTEP is not necessarily any network device. A server supporting VxLAN technology can also act. In our topology, all Leaf switches are VTEPs.
VNI - Virtual Network Index - network identifier within VxLAN. You can draw an analogy with VLAN. However, there are some differences. When using a fabric, VLANs become unique only within one Leaf switch and are not transmitted over the network. But each VLAN can be associated with a VNI number that is already transmitted over the network. What it looks like and how it can be used will be discussed below.
Enable the feature for VxLAN technology to work and the ability to associate VLAN numbers with a VNI number:
feature nv overlay
feature vn-segment-vlan-based
Let's configure the NVE interface, which is responsible for the operation of VxLAN. This interface is responsible for encapsulating frames in VxLAN headers. You can draw an analogy with the Tunnel interface for GRE:
interface nve1
no shutdown
host-reachability protocol bgp ! используем BGP для передачи маршрутной информации
source-interface loopback0 ! интерфейс с которого отправляем пакеты loopback0
On the Leaf-21 switch, everything is created without problems. However, if we check the output of the command show nve peers
, then it will be empty. Here you need to return to the VPC setup. We see that Leaf-11 and Leaf-12 are paired and united by a VPC domain. This results in the following situation:
Host-2 sends one frame to Leaf-21 to be transmitted over the network to Host-1. However, Leaf-21 sees that Host-1's MAC address is available via two VTEPs at once. What should Leaf-21 do in this case? After all, this means that a loop could appear in the network.
To solve this situation, we need Leaf-11 and Leaf-12 to also act as one device within the factory. It is solved quite simply. On the Loopback interface from which we are building the tunnel, add the secondary address. Secondary address must be the same on both VTEPs.
interface loopback0
ip add 10.255.1.10/32 secondary
Thus, from the point of view of other VTEPs, we get the following topology:
That is, now the tunnel will be built between the IP address of Leaf-21 and the virtual IP between two Leaf-11 and Leaf-12. Now there will be no problems with learning the MAC address from two devices, and traffic can be transferred from one VTEP to another. Which of the two VTEPs will process the traffic is decided using the routing table on Spine:
Spine1# sh ip route
<.....>
10.255.1.10/32, ubest/mbest: 2/0
*via 10.255.1.11, Eth1/1, [110/41], 1d01h, ospf-UNDERLAY, intra
*via 10.255.1.12, Eth1/2, [110/41], 1d01h, ospf-UNDERLAY, intra
10.255.1.11/32, ubest/mbest: 1/0
*via 10.255.1.11, Eth1/1, [110/41], 1d22h, ospf-UNDERLAY, intra
10.255.1.12/32, ubest/mbest: 1/0
*via 10.255.1.12, Eth1/2, [110/41], 1d01h, ospf-UNDERLAY, intra
As you can see above, the address 10.255.1.10 is available immediately through two Next-hops.
At this stage, we figured out the basic connectivity. Let's move on to setting up the NVE interface:
We will immediately enable Vlan 10 and associate it with VNI 10000 on each Leaf for hosts. Set up an L2 tunnel between hosts
vlan 10 ! Включаем VLAN на всех VTEP подключенных к необходимым хостам
vn-segment 10000 ! Ассоциируем VLAN с номер VNI
interface nve1
member vni 10000 ! Добавляем VNI 10000 для работы через интерфейс NVE. для инкапсуляции в VxLAN
ingress-replication protocol bgp ! указываем, что для распространения информации о хосте используем BGP
Now let's check nve peers and table for BGP EVPN:
Leaf21# sh nve peers
Interface Peer-IP State LearnType Uptime Router-Mac
--------- --------------- ----- --------- -------- -----------------
nve1 10.255.1.10 Up CP 00:00:41 n/a ! Видим что peer доступен с secondary адреса
Leaf11# sh bgp l2vpn evpn
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 10.255.1.11:32777 (L2VNI 10000) ! От кого именно пришел этот l2VNI
*>l[3]:[0]:[32]:[10.255.1.10]/88 ! EVPN route-type 3 - показывает нашего соседа, который так же знает об l2VNI10000
10.255.1.10 100 32768 i
*>i[3]:[0]:[32]:[10.255.1.20]/88
10.255.1.20 100 0 i
* i 10.255.1.20 100 0 i
Route Distinguisher: 10.255.1.21:32777
* i[3]:[0]:[32]:[10.255.1.20]/88
10.255.1.20 100 0 i
*>i 10.255.1.20 100 0 i
Above we see routes only EVPN route-type 3. This type of routes talks about peer (Leaf), but where are our hosts?
And the thing is that information about MAC hosts is transmitted via EVPN route-type 2
In order to see our hosts, you need to configure EVPN route-type 2:
evpn
vni 10000 l2
route-target import auto ! в рамках данной статьи используем автоматический номер для route-target
route-target export auto
Let's ping Host-2 to Host-1:
Firewall2# ping 192.168.10.1
PING 192.168.10.1 (192.168.10.1): 56 data bytes
36 bytes from 192.168.10.2: Destination Host Unreachable
Request 0 timed out
64 bytes from 192.168.10.1: icmp_seq=1 ttl=254 time=215.555 ms
64 bytes from 192.168.10.1: icmp_seq=2 ttl=254 time=38.756 ms
64 bytes from 192.168.10.1: icmp_seq=3 ttl=254 time=42.484 ms
64 bytes from 192.168.10.1: icmp_seq=4 ttl=254 time=40.983 ms
And below we can see that route-type 2 appeared in the BGP table with the MAC address of the hosts - 5001.0007.0007 and 5001.0008.0007
Leaf11# sh bgp l2vpn evpn
<......>
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 10.255.1.11:32777 (L2VNI 10000)
*>l[2]:[0]:[0]:[48]:[5001.0007.0007]:[0]:[0.0.0.0]/216 ! evpn route-type 2 и mac адрес хоста 1
10.255.1.10 100 32768 i
*>i[2]:[0]:[0]:[48]:[5001.0008.0007]:[0]:[0.0.0.0]/216 ! evpn route-type 2 и mac адрес хоста 2
* i 10.255.1.20 100 0 i
*>l[3]:[0]:[32]:[10.255.1.10]/88
10.255.1.10 100 32768 i
Route Distinguisher: 10.255.1.21:32777
* i[2]:[0]:[0]:[48]:[5001.0008.0007]:[0]:[0.0.0.0]/216
10.255.1.20 100 0 i
*>i 10.255.1.20 100 0 i
Next, you can see detailed information on Update, in which you received information about the MAC Host. Below is not the entire output of the command
Leaf21# sh bgp l2vpn evpn 5001.0007.0007
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 10.255.1.11:32777 ! отправил Update с MAC Host. Не виртуальный адрес VPC, а адрес Leaf
BGP routing table entry for [2]:[0]:[0]:[48]:[5001.0007.0007]:[0]:[0.0.0.0]/216,
version 1507
Paths: (2 available, best #2)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not i
n HW
Path type: internal, path is valid, not best reason: Neighbor Address, no labe
led nexthop
AS-Path: NONE, path sourced internal to AS
10.255.1.10 (metric 81) from 10.255.1.102 (10.255.1.102) ! с кем именно строим VxLAN тоннель
Origin IGP, MED not set, localpref 100, weight 0
Received label 10000 ! Номер VNI, который ассоциирован с VLAN, в котором находится Host
Extcommunity: RT:65001:10000 SOO:10.255.1.10:0 ENCAP:8 ! Тут видно, что RT сформировался автоматически на основе номеров AS и VNI
Originator: 10.255.1.11 Cluster list: 10.255.1.102
<........>
Let's see what the frames look like when they are passed through the factory:
Suppress-ARP
Great, we have an L2 connection between hosts and this could be the end of it. However, not all so simple. As long as we have few hosts, there will be no problems. But let's imagine situations that we have hundreds and thousands of hosts. What problem can we face?
This problem is BUM(Broadcast, Unknown Unicast, Multicast) traffic. In the framework of this article, we will consider the option of combating broadcast traffic.
The main Broadcast generator in Ethernet networks is the hosts themselves via the ARP protocol.
Nexus implements the following mechanism for dealing with ARP requests - suppress-arp.
This feature works like this:
- Host-1 sends an APR request to the Broadcast address of its network.
- The request reaches the Leaf switch and instead of passing this request further to the factory towards Host-2, the Leaf answers itself and indicates the desired IP and MAC.
Thus, the Broadcast request did not go to the factory. But how can this work if Leaf only knows the MAC address?
Everything is quite simple, EVPN route-type 2, in addition to the MAC address, can transmit a MAC / IP bundle. To do this, the Leaf must be configured with an IP address in the VLAN. The question arises, what IP to ask? On nexus, it is possible to create a distributed (same) address on all switches:
feature interface-vlan
fabric forwarding anycast-gateway-mac 0001.0001.0001 ! задаем virtual mac для создания распределенного шлюза между всеми коммутаторами
interface Vlan10
no shutdown
ip address 192.168.10.254/24 ! на всех Leaf задаем одинаковый IP
fabric forwarding mode anycast-gateway ! говорим использовать Virtual mac
Thus, from the point of view of hosts, the network will look like this:
Check BGP l2route evpn
Leaf11# sh bgp l2vpn evpn
<......>
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 10.255.1.11:32777 (L2VNI 10000)
*>l[2]:[0]:[0]:[48]:[5001.0007.0007]:[0]:[0.0.0.0]/216
10.255.1.21 100 32768 i
*>i[2]:[0]:[0]:[48]:[5001.0008.0007]:[0]:[0.0.0.0]/216
10.255.1.10 100 0 i
* i 10.255.1.10 100 0 i
* i[2]:[0]:[0]:[48]:[5001.0008.0007]:[32]:[192.168.10.20]/248
10.255.1.10 100 0 i
*>i 10.255.1.10 100 0 i
<......>
Route Distinguisher: 10.255.1.21:32777
* i[2]:[0]:[0]:[48]:[5001.0008.0007]:[0]:[0.0.0.0]/216
10.255.1.20 100 0 i
*>i 10.255.1.20 100 0 i
* i[2]:[0]:[0]:[48]:[5001.0008.0007]:[32]:[192.168.10.20]/248
*>i 10.255.1.20 100 0 i
<......>
From the output of the command, it can be seen that in EVPN route-type 2, in addition to the MAC, we now also see the IP address of the host.
Let's return to the suppress-arp setting. This setting is enabled for each VNI separately:
interface nve1
member vni 10000
suppress-arp
Then there is some difficulty:
- This feature requires space in TCAM memory. I will give an example of setting for suppress-arp:
hardware access-list tcam region arp-ether 256
This setup will require double-wide. That is, if you set 256, then 512 must be released in TCAM. Setting up TCAM is beyond the scope of this article, since setting up TCAM depends only on the task assigned to you and may differ from one network to another.
- The implementation of suppress-arp must be done on all Leaf switches. However, complexity can arise when configuring on Leaf pairs located in a VPC domain. When changing TCAM, the consistency between the pairs will be broken and one node may be taken out of service. Additionally, a device reboot may be required to apply the TCAM change setting.
As a result, you should carefully consider whether it is worth implementing this setting on a working factory in your situation.
This concludes the first part of the cycle. In the next part, we will consider routing through a VxLAN factory with network separation across different VRFs.
And now I invite everyone to
Source: habr.com