Experience in implementing network fabrics based on EVPN VXLAN and Cisco ACI and a small comparison

Experience in implementing network fabrics based on EVPN VXLAN and Cisco ACI and a small comparison
Evaluate the links in the middle part of the diagram. We will return to them below.

At some point, you may find that large complex networks based on L2 are terminally ill. First of all, problems related to the processing of BUM traffic and the operation of the STP protocol. In the second - in general, morally obsolete architecture. This causes unpleasant problems in the form of downtime and inconvenience of handling.

We had two parallel projects, where customers soberly assessed all the pros and cons of the options and chose two different overlay solutions, and we implemented them.

It was possible to compare the implementation. Not operation, it is worth talking about it in two or three years.

So what is a network fabric with overlay networks and SDN?

What to do with the sore problems of classical network architecture?

New technologies and ideas appear every year. In practice, the urgent need to rebuild the networks did not arise for a long time, because you can also do everything by hand using the good old grandfather methods. So what, what is the twenty-first century in the yard? In the end, the administrator should work, and not sit in his office.

Then a boom in the construction of large-scale data centers began. Then it became clear that the limit of the development of classical architecture had been reached, not only in terms of performance, fault tolerance, scalability. And one of the options for solving these problems was the idea of ​​building overlay networks on top of a routable backbone.

In addition, with the increase in the scale of networks, the problem of managing such factories became acute, as a result of which software-defined networking solutions began to appear with the ability to manage the entire network infrastructure as a whole. And when the network is managed from a single point, it is easier for other components of the IT infrastructure to interact with it, and such interaction processes are easier to automate.

Almost every major manufacturer of not only network equipment, but also virtualization, has options for such solutions in its portfolio.

It remains only to figure out what is suitable for what needs. For example, for especially large companies with a good development and operation team, vendors' out-of-the-box solutions do not always satisfy all needs, and they resort to developing their own SD (software defined) solutions. For example, these are cloud providers that are constantly expanding the range of services provided to their customers, and boxed solutions are simply not able to keep up with their needs.

For medium-sized companies, the functionality offered by the vendor in the form of a boxed solution is enough in 99 percent of cases.

What are overlay networks

What is the idea of ​​overlay networks. Basically, you take a classic routed network and build another network on top of it to get more features. Most often, we are talking about the effective distribution of the load on equipment and communication lines, a significant increase in the scalability limit, increased reliability and a bunch of security goodies (due to segmentation). And SDN solutions, in addition to this, enable very, very, very convenient flexible administration and make the network more transparent for its consumers.

In general, if local networks were invented in the years of the 2010s, they would look far from what we inherited from the military from the 1970s.

In terms of technologies for building factories using overlay networks, there are currently many implementations of manufacturers and Internet RFC projects (EVPN+VXLAN, EVPN+MPLS, EVPN+MPLSoGRE, EVPN+Geneve and others). Yes, there are standards, but the implementation of these standards by different manufacturers may differ, so when creating such factories, it is still possible to completely abandon the vendor lock only in theory on paper.

With the SD solution, things are even more complicated, each vendor has its own vision. There are completely open solutions that, in theory, you can finish on your own, there are completely closed ones.

Cisco offers its own version of SDN for data centers - ACI. Naturally, this is a 100% vendor-locked solution in terms of choosing network equipment, but at the same time it is fully integrated with virtualization, containerization, security, orchestration, load balancers, etc. But in fact it is still a kind of black box, without the possibility of full access to all internal processes. Not all customers agree to this option, since you are completely dependent on the quality of the written solution code and its implementation, but on the other hand, the manufacturer has one of the best technical support in the world and has a dedicated team that deals only with this solution. Cisco ACI was chosen as the solution for the first project.

For the second project, a Juniper solution was chosen. The manufacturer also has its own SDN for the data center, but the customer decided not to implement SDN. The EVPN VXLAN factory was chosen as the technology for building the network without the use of centralized controllers.

What is it for

Creating a factory allows you to build an easily scalable, fault-tolerant, reliable network. The architecture (leaf-spine) takes into account the features of data centers (traffic paths, minimizing delays and bottlenecks in the network). SD solutions in data centers make it very convenient, fast, flexible to manage such a factory, integrate it into the data center ecosystem.

Both customers needed to build redundant data centers to ensure fault tolerance, in addition, traffic between data centers had to be encrypted.

The first customer was already considering fabricless solutions as a possible standard for their networks, but in tests they had problems with STP compatibility between several hardware vendors. There were downtimes that caused service drops. And for the customer it was critical.

Cisco was already the customer's enterprise standard, they looked at ACI and other options and decided that it was worth taking this particular solution. I liked the automation of control from one button through a single controller. Services are set up faster, managed faster. We decided to provide traffic encryption by running MACSec between the IPN and SPINE switches. Thus, it was possible to avoid a bottleneck in the form of a crypto-gateway, save on them and use the bandwidth to the maximum.

The second customer chose Juniper's controllerless solution because their existing data center already had a small installation with an EVPN VXLAN fabric implementation. But there it was not fault-tolerant (one switch was used). We decided to expand the infrastructure of the main data center and build a factory in the backup data center. The existing EVPN was not fully utilized: VXLAN encapsulation was not actually used, since all hosts were connected to the same switch, and all MAC addresses and /32 host addresses were local, the same switch was the gateway for them, there were no other devices, where it was necessary to build VXLAN tunnels. They decided to provide traffic encryption using IPSEC technology between firewalls (ITU performance was sufficient).

They also tried ACI, but decided that because of the vendor lock, they would have to buy too much hardware, including replacing recently purchased new equipment, and it simply does not make economic sense. Yes, the Cisco fabric integrates with everything, but only its devices are possible inside the fabric itself.

On the other hand, as mentioned earlier, you can’t just mix an EVPN VXLAN factory with any neighboring vendor because the protocol implementations are different. It's like crossing Cisco and Huawei in the same network - it seems that the standards are common, only you have to dance with a tambourine. Since this is a bank, and compatibility tests would be very long, we decided that it was better to buy from the same vendor now, and not really get carried away with functionality beyond the base one.

Migration plan

Two data centers based on ACI:

Experience in implementing network fabrics based on EVPN VXLAN and Cisco ACI and a small comparison

Organization of interaction between data centers. A Multi-Pod solution has been chosen - each data center is a pod. The requirements for scaling by the number of switches and delays between pods (RTT less than 50 ms) are taken into account. It was decided not to build a Multi-Site solution for ease of management (a single management interface is used for a Multi-Pod solution, for Multi-Site there would be two interfaces, or a Multi-Site Orchestrator would be required), and since no geographical reservation of sites was required.

Experience in implementing network fabrics based on EVPN VXLAN and Cisco ACI and a small comparison

From the point of view of migration of services from the Legacy network, the most transparent option was chosen, gradually transfer VLANs corresponding to certain services.
For migration, a corresponding EPG (End-point-group) was created for each VLAN at the factory. First, the network was stretched between the old network and the factory along L2, then after the migration of all hosts, the gateway was transferred to the factory, and the EPG interacted with the existing network through L3OUT, while the interaction between L3OUT and EPG was described using contracts. Approximate scheme:

Experience in implementing network fabrics based on EVPN VXLAN and Cisco ACI and a small comparison

The approximate structure of most ACI factory policies is shown in the figure below. The entire setting is based on policies nested in other policies, and so on. At first it is very difficult to figure it out, but gradually, as practice shows, network administrators get used to such a structure in about a month, and then only comes the understanding of how convenient it is.

Experience in implementing network fabrics based on EVPN VXLAN and Cisco ACI and a small comparison

Comparison

In the Cisco ACI solution, you need to buy more equipment (separate switches for Inter-Pod interaction and APIC controllers), due to which it turned out to be more expensive. Juniper's solution did not require the purchase of controllers and accessories; it turned out to partially use the already existing equipment of the customer.

Here is the EVPN VXLAN fabric architecture for the two data centers of the second project:

Experience in implementing network fabrics based on EVPN VXLAN and Cisco ACI and a small comparison
Experience in implementing network fabrics based on EVPN VXLAN and Cisco ACI and a small comparison

In ACI, you get a ready-made solution - no need to pick, no need to optimize. At the initial acquaintance of the customer with the factory, developers are not needed, supporting people are not needed for code and automation. Simple operation is enough, many settings can be done in general through a wizard, which is not always a plus, especially for people who are used to the command line. In any case, it takes time to rebuild the brain on new tracks, on the peculiarities of settings through policies and operating on a multitude of nested policies. It is highly desirable, in addition to this, to have a clear structure for naming policies and objects. If there is any problem in the logic of the controller, it can be solved only through technical support.

In EVPN, the console. Suffer or rejoice. Familiar interface for the old guard. Yes, there is a typical configuration and guides. You have to smoke mana. Different designs, everything is clear and detailed.

Naturally, in both cases, it is better to migrate not the most critical services first, for example, test environments, and only then, after catching all the bugs, proceed to production. And don't tune in on Friday night. You should not trust the vendor that everything will be ok, it is always better to play it safe.

You pay more for ACI, although Cisco is currently actively promoting this solution and often gives good discounts for it, but you save on maintenance. Management and any kind of automation of an EVPN factory without a controller requires investments and regular costs - monitoring, automation, implementation of new services. At the same time, the initial launch of ACI takes 30–40 percent longer. This is because it takes longer to create the entire set of necessary profiles and policies, which will then be used. But as the network grows, the number of configurations needed decreases. You use already pre-created policies, profiles, objects. You can flexibly configure segmentation and security, centrally manage contracts that are responsible for resolving certain interactions between EPG - the amount of work drops sharply.

In EVPN, you must configure each device in the factory, the probability of error is greater.

If ACI is slower to implement, then EVPN took almost twice as long to debug. If in the case of Cisco you can always call a support engineer and ask about the network as a whole (because it is covered as a solution), then you buy only hardware from Juniper Networks, and that is what is covered. Packets left the device? Okay, then your problems. But you can open a question about choosing a solution or network design - and then they will advise you to purchase a professional service, for an additional fee.

ACI support is very cool, because it is separate: a separate team sits just for this. There are, including Russian-speaking specialists. The guide is detailed, the decisions are predetermined. Watch and advise. They quickly validate the design, which is often important. Juniper Networks do the same thing, but at times slower (we used to do it, now it should be better according to rumors), which forces you to do everything yourself where a solution engineer could advise.

Cisco ACI supports integration with virtualization and containerization systems (VMware, Kubernetes, Hyper-V) and centralized management. There are network services and security services - balancing, firewalls, WAF, IPS and more ... Good micro-segmentation out of the box. In the second solution, integration with network services is done with a tambourine, and it is better to smoke forums with those who did this in advance.

Сonclusion

For each specific case, it is necessary to select a solution, not only based on the cost of equipment, but it is also necessary to take into account further operating costs and the main problems that the customer is facing now, and what are the plans for the development of IT infrastructure.

ACI due to additional equipment came out more expensive, but the solution is ready without the need for additional sawing, the second solution is more complicated and costly in terms of operation, but cheaper.

If you want to discuss how much it can cost to implement a network factory on different vendors, and what kind of architecture is needed, you can meet and chat. Before the rough sketch of the architecture (with which you can calculate the budgets), we will give you a free hint, a detailed study, of course, is already paid.

Vladimir Klepche, corporate networks.

Source: habr.com

Add a comment