Automation for the smallest. Part one (which is after zero). Network virtualization

Π’ previous issue I described the network automation framework. According to some people, even this first approach to the problem has already put some questions on the shelves. And this makes me very happy, because our goal in the cycle is not to smear ansible with Python scripts, but to build a system.

The same framework sets the order in which we will deal with the issue.
And network virtualization, which this issue is dedicated to, does not really fit into the subject of ADSM, where we analyze automation.

But let's look at it from a different angle.

Many services have been using the same network for a long time. In the case of a telecom operator, these are 2G, 3G, LTE, broadband and B2B, for example. In the case of a DC: connectivity for different clients, the Internet, block storage, object storage.

And all services require isolation from each other. This is how overlay networks appeared.

And all services do not want to wait for a person to manually configure them. This is how orchestrators and SDN appeared.

The first approach to the systematic automation of the network, or rather part of it, has long been taken and implemented in many places: VMWare, OpenStack, Google Compute Cloud, AWS, Facebook.

Let's take a look at him today.

Automation for the smallest. Part one (which is after zero). Network virtualization

Content

  • Causes
  • Vocabulary
  • Underlay - physical network
  • Overlay - virtual network
    • Overlay with ToR
    • Overlay from host
    • On the example of Tungsten Fabric
      • Communication within a single physical machine
      • Communication between VMs located on different physical machines
      • Exit to the outside world

  • FAQ
  • Conclusion
  • Useful links

Causes

And since we are talking about this, it is worth mentioning the prerequisites for network virtualization. In fact, this process did not begin yesterday.

Probably, you have heard more than once that the network has always been the most inert part of any system. And this is true in every sense. The network is the basis on which everything relies, and it is quite difficult to make changes on it - services do not tolerate when the network is down. Often, the decommissioning of a single node can add up to most applications and affect many customers. This is partly why the network team can resist any change - because now it somehow works (we may not even know how), and here you need to configure something new, and it is not known how it will affect the network.

In order not to wait for networkers to throw VLANs and not to register any services on each network node, people came up with the idea of ​​using overlays - overlay networks - of which there is a great variety: GRE, IPinIP, MPLS, MPLS L2 / L3VPN, VXLAN, GENEVE, MPLSoverUDP, MPLSoverGRE, etc.

Their attraction lies in two simple things:

  • Only end nodes are configured - transit nodes do not have to be touched. This greatly speeds up the process, and sometimes even allows you to exclude the network infrastructure department from the process of introducing new services.
  • The load is hidden deep inside the headers - transit nodes do not need to know anything about it, about addressing on hosts, routes of the overlay network. And this means that you need to store less information in tables, which means taking a simpler / cheaper device.

In this not quite full-fledged issue, I do not plan to analyze all possible technologies, but rather describe the framework for the operation of overlay networks in DCs.

The whole series will describe the data center, consisting of rows of the same type of racks in which the same server hardware is installed.

This equipment runs virtual machines/containers/serverless that implement services.

Automation for the smallest. Part one (which is after zero). Network virtualization

Vocabulary

In cycle server I will refer to the program that implements the server side of client-server communication.

Call physical machines in racks servers not we will.

physical machine - x86 computer installed in a rack. We most often use the term host. So let's call it "machine" or host.

hypervisor β€” an application running on a physical machine that emulates the physical resources on which Virtual Machines run. Sometimes in the literature and on the web, the word "hypervisor" is used as a synonym for "host".

Virtual machine - an operating system running on a physical machine on top of a hypervisor. For us in this cycle, it is not so important whether it is actually a virtual machine or just a container. Let's call it "VMΒ«

tenant - a broad concept, which I will define in this article as a separate service or a separate client.

Multi-tenancy or multi-tenancy - the use of the same application by different clients / services. At the same time, the isolation of clients from each other is achieved due to the architecture of the application, and not separately-launched instances.

ToR - Top of the Rack switch - a switch installed in a rack to which all physical machines are connected.

In addition to the ToR topology, different providers practice End of Row (EoR) or Middle of Row (although the latter is a disparaging rarity and I have not seen MoR abbreviations).

underlay network or underlying network or underlay - physical network infrastructure: switches, routers, cables.

overlay network or overlay network or overlay - a virtual network of tunnels that runs on top of the physical one.

L3 factory or IP factory is an amazing invention of mankind, which allows you not to repeat STP and not learn TRILL for interviews. A concept in which the entire network up to the access level is exclusively L3, without VLANs and, accordingly, huge stretched broadcast domains. Where does the word "factory" come from, we'll figure it out in the next part.

SDN β€” Software Defined Network. Hardly needs an introduction. An approach to network management when changes to the network are performed not by a person, but by a program. Usually means moving the Control Plane beyond the end network devices to the controller.

NFV - Network Function Virtualization - virtualization of network devices, assuming that part of the network functions can be run as virtual machines or containers to accelerate the introduction of new services, the organization of Service Chaining and easier horizontal scalability.

VNF β€” Virtual Network Function. Specific virtual device: router, switch, firewall, NAT, IPS/IDS, etc.

Automation for the smallest. Part one (which is after zero). Network virtualization

I now deliberately simplify the description to a specific implementation, so as not to confuse the reader too much. For more thoughtful reading, I refer it to the section references. In addition, Roma Gorge, who criticizes this article for inaccuracies, promises to write a separate issue about server and network virtualization technologies, more in-depth and attentive to details.

Most networks today can be explicitly broken down into two parts:

Underlay β€” a physical network with a stable configuration.
Overlay - abstraction over Underlay for isolating tenants.

This is true both for the DC case (which we will analyze in this article) and for the ISP (which we will not analyze, because it was already in SDSM). With enterprise networks, of course, the situation is somewhat different.

Network focus picture:

Automation for the smallest. Part one (which is after zero). Network virtualization

Underlay

Underlay is a physical network: hardware switches and cables. The devices in the underlay know how to get to the physical machines.

Automation for the smallest. Part one (which is after zero). Network virtualization

It relies on standard protocols and technologies. Not least because hardware devices to this day work on proprietary software that does not allow either chip programming or the implementation of their protocols, respectively, compatibility with other vendors and standardization are needed.

But someone like Google can afford to develop their own switches and abandon conventional protocols. But LAN_DC is not Google.

Underlay changes relatively infrequently, because its purpose is basic IP connectivity between physical machines. Underlay knows nothing about the services running on top of it, clients, tenants - it only needs to deliver the package from one machine to another.
Underlay could be like this:

  • IPv4+OSPF
  • IPv6+ISIS+BGP+L3VPN
  • L2+TRILL
  • L2+STP

The Underlay network is configured in the classical way: CLI/GUI/NETCONF.

Manually, scripts, proprietary utilities.

The next article in the cycle will be devoted to Anderlay in more detail.

Overlay

Overlay is a virtual network of tunnels stretched over Underlay, it allows the VMs of one client to communicate with each other, while providing isolation from other clients.

The client data is encapsulated in some sort of tunneling header for transmission over the public network.

Automation for the smallest. Part one (which is after zero). Network virtualization

So VMs of one client (one service) can communicate with each other through Overlay, without even knowing what path the packet actually takes.

Overlay can be for example like this, as I mentioned above:

  • GRE tunnel
  • VXLAN
  • EVPN
  • L3VPN
  • GENEVA

The overlay network is usually configured and maintained through a central controller. From it, the configuration, Control Plane and Data Plane are delivered to devices that are engaged in routing and encapsulating client traffic. a little below Let's look at this with examples.

Yes, this is pure SDN.

There are two fundamentally different approaches to organizing an Overlay network:

  1. Overlay with ToR
  2. Overlay from host

Overlay with ToR

Overlay can start on an access switch (ToR) in a rack, as happens, for example, in the case of a VXLAN fabric.

This is a time-tested mechanism on ISP networks and all vendors of network equipment support it.

However, in this case, the ToR switch must be able to separate the various services, respectively, and the network administrator must cooperate to a certain extent with the virtual machine administrators and make changes (albeit automatically) to the device configuration.

Automation for the smallest. Part one (which is after zero). Network virtualization

Here I will refer the reader to an article about VxLAN on HabrΓ© our old friend @bormoglotx.
In this presentations with ENOG approaches to building a DC network with an EVPN VXLAN factory are described in detail.

And for a more complete immersion in reality, you can read tsiska's book A Modern, Open, and Scalable Fabric: VXLAN EVPN.

I note that VXLAN is only an encapsulation method and tunnel termination can occur not on ToR, but on the host, as happens in the case of OpenStack, for example.

However, the VXLAN factory where overlay starts on ToR is one of the well-established overlay network designs.

Overlay from host

Another approach is to start and terminate tunnels on end hosts.
In this case, the network (Underlay) remains as simple and static as possible.
And the host itself does all the necessary encapsulation.

Automation for the smallest. Part one (which is after zero). Network virtualization

To do this, of course, you will need to run a special application on the hosts, but it's worth it.

Firstly, it’s easier to run a client on a linux machine or, let’s say, it’s even possible, while on a switch, you will most likely have to turn to proprietary SDN solutions for the time being, which kills the idea of ​​multi-vendor.

Secondly, the ToR switch in this case can be left as simple as possible, both from the point of view of the Control Plane and the Data Plane. Indeed, then he does not need to communicate with the SDN controller, and it is also enough to store the networks / ARPs of all connected clients - it is enough to know the IP address of the physical machine, which greatly facilitates the switching / routing tables.

In the ADSM series, I choose the overlay approach from the host - then we will only talk about it and we will not return to the VXLAN factory.

It's easiest to look at examples. And as a test subject, we will take the OpenSource SDN platform OpenContrail, now known as Tungsten Fabric.

At the end of the article, I will give some thoughts on the analogy with OpenFlow and OpenvSwitch.

On the example of Tungsten Fabric

Every physical machine has vRouter - a virtual router that knows about the networks connected to it and which clients they belong to - in fact - a PE router. For each client, it maintains an isolated routing table (read VRF). And actually vRouter does Overlay tunneling.

A little more about vRouter is at the end of the article.

Each VM located on a hypervisor connects to that machine's vRouter via TAP interface.

TAP - Terminal Access Point - a virtual interface in the linux kernel that allows network interaction.

Automation for the smallest. Part one (which is after zero). Network virtualization

If there are several networks behind the vRouter, then for each of them a virtual interface is created, to which an IP address is assigned - it will be the default gateway address.
All networks of one client are placed in one VRF (one table), different - in different.
I will make a reservation here that everything is not so simple, and I will send the inquisitive reader to the end of the article.

In order for vRouters to communicate with each other, and, accordingly, the VMs behind them, they exchange routing information through SDN controller.

Automation for the smallest. Part one (which is after zero). Network virtualization

To get out into the outside world, there is an exit point from the matrix - the virtual network gateway VNGW - Virtual Network Gateway (my term).

Automation for the smallest. Part one (which is after zero). Network virtualization

Now let's look at examples of communications - and there will be clarity.

Communication within a single physical machine

VM0 wants to send a packet to VM2. Let's assume for now that this is a single client VM.

data plane

  1. VM-0 has a default route to its eth0 interface. The package is sent there.
    This eth0 interface is actually connected virtually to the vRouter virtual router via the tap0 TAP interface.
  2. vRouter analyzes on which interface the packet came, that is, to which client (VRF) it belongs, checks the recipient's address with the routing table of this client.
  3. Having found that the recipient is on the same machine behind a different port, vRouter simply sends the packet to it without any additional headers - in this case, the vRouter already has an ARP entry.

Automation for the smallest. Part one (which is after zero). Network virtualization

The packet in this case does not enter the physical network - it is routed inside the vRouter.

Control plane

When the virtual machine starts up, the hypervisor tells it:

  • Her own IP address.
  • The default route is via the vRouter's IP address on this network.

The hypervisor reports to vRouter via a special API:

  • What you need to create a virtual interface.
  • What it (VM) needs to create Virtual Network.
  • To what VRF it (VN) to bind.
  • A static ARP entry for this VM - on which interface its IP address is located and to which MAC address it is bound.

And again, the actual interaction procedure is simplified for the sake of understanding the concept.

Automation for the smallest. Part one (which is after zero). Network virtualization

Thus, vRouter sees all VMs of one client on this machine as directly connected networks and can route itself between them.

But VM0 and VM1 belong to different clients, respectively, are in different vRouter tables.

Whether they can communicate directly with each other depends on the vRouter settings and network design.
For example, if the VMs of both clients use public addresses, or NAT occurs on the vRouter itself, then direct routing to the vRouter can also be done.

In the opposite situation, the intersection of address spaces is possible - you need to go through a NAT server to get a public address - this is similar to accessing external networks, which are discussed below.

Communication between VMs located on different physical machines

data plane

  1. The start is exactly the same: VM-0 sends a packet to VM-7 (172.17.3.2) by default.
  2. vRouter receives it and this time sees that the destination is on another machine and is reachable through the Tunnel0 tunnel.
  3. First, it hangs up an MPLS label that identifies the remote interface, so that on the reverse side vRouter can determine where to put this packet, and without additional lookups.

    Automation for the smallest. Part one (which is after zero). Network virtualization

  4. Tunnel0 has source 10.0.0.2, destination 10.0.1.2.
    The vRouter adds the GRE (or UDP) headers and the new IP to the original packet.
  5. The vRouter routing table has a default route through ToR1 address 10.0.0.1. It sends there.

    Automation for the smallest. Part one (which is after zero). Network virtualization

  6. ToR1, as a member of the Underlay network, knows (for example, via OSPF) how to get to 10.0.1.2, and sends the packet along the route. Note that ECMP is enabled here. There are two next-hops in the illustration, and different streams will be decomposed into them by hash. In the case of a real factory, there will be more likely 4 next-hops.

    At the same time, he does not need to know what is under the external IP header. That is, in fact, under IP there can be a sandwich from IPv6 over MPLS over Ethernet over MPLS over GRE over over over Greek.

  7. Accordingly, on the receiving side, vRouter removes GRE and, using the MPLS label, understands which interface this packet should be sent to, strips it and sends it in its original form to the recipient.

Control plane

When the machine is started, the same thing happens as described above.

And plus the following:

  • For each client, vRouter allocates an MPLS tag. This is the L3VPN service label by which clients will be separated within the same physical machine.

    In fact, the MPLS label is always allocated by the vRouter, because it is not known in advance that the machine will only interact with other machines behind the same vRouter, and this is most likely not even true.

  • The vRouter establishes a connection with the SDN controller via the BGP protocol (or similar - in the case of TF, this is XMPP 0_o).
  • Through this session, vRouter tells the SDN controller routes to connected networks:
    • Network address
    • Encapsulation method (MPLSoGRE, MPLSoUDP, VXLAN)
    • MPLS client label
    • Your IP address as nexthop

  • The SDN controller receives such routes from all connected vRouters, and reflects them to others. That is, it acts as a Route Reflector.

The same thing happens in reverse.

Automation for the smallest. Part one (which is after zero). Network virtualization

Overlay can change at least every minute. This is similar to what happens in public clouds, when customers regularly start and shut down their virtual machines.

The central controller takes care of all the complexity of maintaining the configuration and control of the switching / routing tables on the vRouter.

Roughly speaking, the controller locks itself with all vRouters via BGP (or a similar protocol) and simply transmits routing information. BGP for example already has an Address-Family to carry the encapsulation method MPLS-in-GRE or MPLS-in-UDP.

At the same time, the configuration of the Underlay network does not change in any way, which, by the way, is an order of magnitude more difficult to automate, and easier to break with an awkward movement.

Exit to the outside world

Somewhere the simulation must end, and you need to get out of the virtual world into the real one. And you need a payphone gateway.

Two approaches are practiced:

  1. A hardware router is installed.
  2. Any appliance is launched that implements the functions of the router (yes, after SDN, we also encountered VNF). Let's call it a virtual gateway.

The advantage of the second approach in cheap horizontal scalability - there is not enough power - we launched another virtual machine with a gateway. On any physical machine, without the need to look for free racks, units, power outlets, buy the piece of iron itself, transport it, install, switch, configure, and then also change faulty components in it.

The disadvantages of a virtual gateway are that a unit of a physical router is still orders of magnitude more powerful than a multi-core virtual machine, and its software, adjusted to its own hardware base, works much more stable (no). It is also difficult to deny the fact that the software and hardware complex simply works, requiring only configuration, while launching and maintaining a virtual gateway is an occupation for strong engineers.

With one foot, the gateway looks into the Overlay virtual network, like a normal Virtual Machine, and can interact with all other VMs. At the same time, it can terminate the networks of all clients on itself and, accordingly, carry out routing between them.

With the other foot, the gateway is already looking into the backbone network and knows how to get out to the Internet.

Automation for the smallest. Part one (which is after zero). Network virtualization

data plane

So the process looks like this:

  1. VM-0, having a default in the same vRouter, sends a packet with a destination in the outside world (185.147.83.177) to the eth0 interface.
  2. vRouter receives this packet and looks up the destination address in the routing table - finds the default route through the VNGW1 gateway through Tunnel 1.
    He also sees that this is a GRE tunnel with SIP 10.0.0.2 and DIP 10.0.255.2, and he also needs to first hang the MPLS label for this client, which VNGW1 expects.
  3. vRouter packages the initial packet in MPLS, GRE headers and new IP and sends it to ToR1 address 10.0.0.1 by default.
  4. The underlay network delivers the packet to the VNGW1 gateway.
  5. The VNGW1 gateway removes the GRE and MPLS tunneling headers, sees the destination address, consults its routing table and understands that it is directed to the Internet - that means through Full View or Default. Performs NAT translation if necessary.
  6. From VNGW to the border, there may be a regular IP network, which is unlikely.
    It can be a classic MPLS network (IGP + LDP / RSVP TE), it can be a back factory with BGP LU or a GRE tunnel from VNGW to the border through an IP network.
    Be that as it may, VNGW1 performs the necessary encapsulations and sends the initial packet towards the border.

Automation for the smallest. Part one (which is after zero). Network virtualization

Traffic in the opposite direction goes through the same steps in the opposite order.

  1. The border drops the packet to VNGW1
  2. He undresses him, looks at the recipient's address and sees that he is available through the Tunnel1 tunnel (MPLSoGRE or MPLSoUDP).
  3. Accordingly, it hangs up the MPLS label, the GRE / UDP header and the new IP and sends it to its ToR3 10.0.255.1.
    The tunnel destination address is the IP address of the vRouter followed by the target VM - 10.0.0.2.
  4. The underlay network delivers the packet to the desired vRouter.
  5. The target vRouter removes GRE / UDP, determines the interface using the MPLS label and sends a bare IP packet to its TAP interface associated with the eth0 VM.

Automation for the smallest. Part one (which is after zero). Network virtualization

Control plane

VNGW1 establishes a BGP neighborhood with an SDN controller, from which it receives all routing information about clients: which client is behind which IP address (vRouter), and which MPLS label it is identified by.

Similarly, he himself reports the default route with the label of this client to the SDN controller, indicating himself as nexthop. And then this default comes to vRouters.

On a VNGW, route aggregation or NAT translation usually occurs.

And in the other direction, in a session with borders or Route Reflectors, it gives exactly this aggregated route. And from them it receives the default route or Full-View, or something else.

In terms of encapsulation and traffic exchange, VNGW is no different from vRouter.
If you expand the scope a bit, then other network devices can be added to VNGWs and vRouters, such as firewalls, traffic cleaning or enrichment farms, IPS, and so on.

And with the help of consistent creation of VRFs and the correct announcement of routes, you can make traffic loop the way you want, which is called Service Chaining.

That is, here the SDN controller acts as a Route-Reflector between VNGW, vRouters and other network devices.

But in fact, the controller releases more information about ACL and PBR (Policy Based Routing), forcing individual traffic flows to go differently than the route tells them to.

Automation for the smallest. Part one (which is after zero). Network virtualization

FAQ

Why do you always make a remark GRE / UDP?

Well, in general, this can be said to be specific to Tungsten Fabric - you can not take it into account at all.

But if you take it, then TF itself, while still being OpenContrail, supported both encapsulations: MPLS in GRE and MPLS in UDP.

UDP is good because in the Source Port in its header it is very easy to encode a hash function from the original IP + Proto + Port, which will allow balancing.

In the case of GRE, alas, there are only external IP and GRE headers that are the same for all encapsulated traffic and there is no talk of balancing - few people can look so deep into the packet.

Until some time, routers, if they were able to use dynamic tunnels, then only in MPLSoGRE, and only recently, they learned in MPLSoUDP. Therefore, you always have to make a remark about the possibility of two different encapsulations.

In fairness, it is worth noting that TF fully supports L2 connectivity using VXLAN.

You promised to draw parallels with OpenFlow.
They really are asking. vSwitch in the same OpenStack does very similar things using VXLAN, which, by the way, also has a UDP header.

In the Data Plane, they work about the same, the Control Plane differs significantly. Tungsten Fabric uses XMPP to deliver route information to vRouter while running Openflow on OpenStack.

Can you tell me a little more about vRouter?
It is divided into two parts: vRouter Agent and vRouter Forwarder.

The first one runs in the User Space of the host OS and communicates with the SDN controller, exchanging information about routes, VRFs, and ACLs.

The second implements Data Plane - usually in Kernel Space, but can also run on SmartNICs - network cards with a CPU and a separate programmable switching chip, which allows you to take the load off the CPU of the host machine, and make the network faster and more predictable.

Another scenario is possible when vRouter is a DPDK application in User Space.

The vRouter Agent pulls down settings on the vRouter Forwarder.

What is Virtual Network?
I mentioned at the beginning of the article about VRF that they say that each tenant is tied to its own VRF. And if this was enough for a superficial understanding of the work of the overlay network, then at the next iteration it is necessary to make clarifications.

Usually, in virtualization mechanisms, the essence of the Virtual Network (you can consider this a proper name) is introduced separately from clients / tenants / virtual machines - a completely independent thing. And this Virtual Network can already be connected through interfaces to one tenant, to another, to two, but at least where. So, for example, Service Chaining is implemented when traffic needs to be passed through certain nodes in the right sequence, simply creating and accepting Virtual Networks in the right sequence.

Therefore, as such, there is no direct correspondence between the Virtual Network and the tenant.

Conclusion

This is a very superficial description of the operation of a virtual network with an overlay from the host and an SDN controller. But whichever virtualization platform you choose today, it will work in a similar way, be it VMWare, ACI, OpenStack, CloudStack, Tungsten Fabric or Juniper Contrail. They will differ in types of encapsulations and headers, protocols for delivering information to end network devices, but the principle of a software-configurable overlay network operating on top of a relatively simple and static underlay network will remain the same.
We can say that the field of creating a private cloud today, SDN based on an overlay network has won. However, this does not mean that Openflow has no place in the modern world - it is used in OpenStacke and in the same VMWare NSX, as far as I know, Google uses it to set up an underlay network.

Below I have provided links to more detailed materials if you want to study the issue in more depth.

And what about our Underlay?

But in general, nothing. He didn't change all the way. All it needs to do in the case of an overlay from the host is to update routes and ARPs as vRouter / VNGW appears and disappears and drag packets between them.

Let's formulate a list of requirements for the Underlay network.

  1. To be able to use some kind of routing protocol, in our situation - BGP.
  2. Have a wide band, preferably without oversubscription, so that packets are not lost due to congestion.
  3. Supporting ECMP is an integral part of the factory.
  4. Be able to provide QoS, including tricky things like ECN.
  5. Support NETCONF - a reserve for the future.

I devoted very little time here to the work of the Underlay network itself. This is because I will focus on it later in the series, and we will only touch on Overlay in passing.

Obviously, I'm severely limiting us all, using as an example a DC network built at Klose's factory with pure IP routing and an overlay from the host.

However, I am sure that any network that has a design can be described in formal terms and automated. It's just that I'm pursuing the goal here to understand the approaches to automation, and not to confuse everyone in general, solving the problem in a general way.

As part of ADSM, Roman Gorge and I plan to publish a separate issue on virtualization of computing power and its interaction with network virtualization. Stay in touch.

Useful links

Thanks

  • Roman Gorga Former linkmeup podcast host and now cloud platform expert. For comments and edits. Well, we are waiting for his deeper article on virtualization in the near future.
  • Alexander Shalimov - to my colleague and expert in the field of virtual network development. For comments and edits.
  • Valentin Sinitsyn β€” to my colleague and Tungsten Fabric expert. For comments and edits.
  • Artyom Chernobay - illustrator linkmeup. For KDPV.
  • Alexander Limonov. For the "automato" meme.

Source: habr.com

Add a comment