How to take control of your network infrastructure. Chapter Four. Automation. Templates

This article is the sixth in a series of articles "How to take control of your network infrastructure." The content of all articles in the cycle and links can be found here.

Leaving a few topics behind, I decided to start a new chapter anyway.

I'll get back to security later. Here I want to discuss one simple but effective approach, which, I am sure, in one form or another, can be useful to many. It's more of a short story about how automation can change the life of an engineer. It's about using templates. At the end is a list of my projects, where you can see how everything described here works.

DevOps for the Web

Creating a configuration with a script, using GIT to control IT infrastructure changes, remote β€œpouring” - these ideas come first when you think about the technical implementation of the DevOps approach. The advantages are obvious. But, unfortunately, there are also disadvantages.

When more than 5 years ago, our developers came to us, to networkers, with these proposals, we were not delighted.

I must say that we inherited a rather motley network, consisting of equipment from about 10 different vendors. It was convenient to configure something through our favorite cli, but somewhere we preferred to use the GUI. In addition, long work on "live" equipment was accustomed to real-time control. For example, when making changes, I feel much more comfortable working directly through the cli. This way I can quickly see if something went wrong and "roll back" the changes. All this was in some contradiction with their ideas.

Other questions arise, for example, the interface may change slightly from version to version of the software. This will eventually cause your script to create the wrong "config". I would not want to use production for "breaking in".

Or, how to understand that the configuration commands were applied correctly and what to do in case of an error?

I do not want to say that all these issues are unsolvable. Just by saying "A", it's probably reasonable to say "B" as well, and if you want to use the same processes for change control as in development, then you need to have a dev and staging environment in addition to production. Then this approach looks complete. But how much will it cost?

But there is one situation when the minuses are practically leveled, and only pluses remain. I'm talking about design work.

Project

For the last two years I have been involved in a project to build a data center for one large provider. I am responsible for F5 and Palo Alto on this project. From Cisco's point of view, this is "3rd party equipment".

For me personally, there are two distinct stages in this project.

The first stage

The first year I was endlessly busy, I worked nights and weekends. I couldn't lift my head. The pressure from management and the customer was strong and continuous. In a constant routine, I could not even try to optimize the process. It was not only and not even so much the configuration of equipment as the preparation of project documentation.

So the first tests began, and I would be amazed how many small mistakes and inaccuracies were made. Of course, everything worked, but there was a missing letter in the name, a line in the command was missing here ... The tests went on and on, and I was already in a constant, daily struggle with errors, tests and documentation.

This went on for a year. The project, as I understand it, was not easy for everyone, but gradually the client became more and more satisfied, and this made it possible to take on additional engineers who were able to take on part of the routine.

Now we could take a look around.
And that was the start of the second phase.

The second stage

I decided to automate the process.

What I understood from communication with the developers at that time (and we must pay tribute, we had a strong team) is that the text format, although it seems at first glance to be something from the world of the DOS operating system, has a number of valuable properties .
So, for example, the text format is useful if you want to take full advantage of GIT and all its derivatives. And I wanted.

Well, it would seem that you can just store a configuration or a list of commands, but making changes is rather inconvenient. In addition, there is another important task during the design. You should have documentation describing your design in general (Low Level Design) and specific implementation (Network Implementation Plan). And in this case, the use of templates looks like a very suitable option.

So, when using YAML and Jinja2, a YAML file with configuration parameters, such as IP addresses, BGP AS numbers, ... perfectly fulfills the role of NIP, while Jinja2 templates include a syntax that matches the design, that is, in fact, it is a reflection of LLD.

It took two days to learn YAML and Jinja2. A few good examples are enough to understand how this works. Then it took about two weeks to create all the templates that matched our design: a week for Palo Alto and another week for F5. All this was posted on the corporate githab.

Now the change process looked like this:

  • changed yaml file
  • created a config file using a template (Jinja2)
  • saved to a remote repository
  • uploaded the created configuration to the equipment
  • saw an error
  • changed YAML file or Jinja2 template
  • created a config file using a template (Jinja2)
  • ...

It is clear that at first a lot of time was spent on editing, but after a week or two it already became more of a rarity.

A good test and an opportunity to debug everything was the client's desire to change the naming convention. Who worked with F5 understands the piquancy of the situation. But for me it was pretty easy. I changed the names in the YAML file, deleted all the configuration from the equipment, generated a new one and uploaded it. Everything, including bug fixes, took 4 days: two days for each technology. After that, I was ready for the next step, namely the creation of DEV and Staging data centers.

Dev and Staging

Staging actually completely repeats production. Dev is a heavily stripped-down copy built mostly on virtual hardware. An ideal situation for a new approach. If I isolate the time spent by me from the general process, then the work, I think, took no more than 2 weeks. The main time is the time of waiting for the other side, and the joint search for problems. The implementation of the 3rd party went almost unnoticed by others. There was even time to teach something and write a couple of articles on HabrΓ© πŸ™‚

To sum up

So, what do I have in the bottom line?

  • all that is required for me to change the configuration is to change a simple, clearly structured YAML file with configuration parameters. I never change python script and very rarely (only if there is a bug) I change Jinja2 heatplate
  • from the point of view of the documentation, the situation is almost ideal. You change the documentation (YAML files act as NIP) and upload this configuration to the equipment. So your documentation is always up to date

All this led to the fact that

  • error rate dropped to almost 0
  • gone 90 percent of the routine
  • significantly increased the speed of implementation

PAY, F5Y, ACY

I said that a few examples are enough to understand how it works.
Here is a short (and of course modified) version of what was created in the course of my work.

PAY = deployment PHello Alto from Yaml = Palo Alto from Yaml
F5Y = deployment F5 from Yaml= F5 from Yaml (coming soon)
ACY = deployment ACi from Yaml= F5 from Yaml

I will add a few words about ACY (not to be confused with ACI).

Those who worked with ACI know that this miracle (and in a good way too) was definitely not created by networkers :). Forget everything you knew about the network - you won't need it!
A little exaggerated, but approximately conveys the feeling that I have been constantly experiencing, for the past 3 years, while working with ACI.

And in this case, ACY is not only the ability to build a change control process (which is especially important in the case of ACI, because it is supposed to be the central and most critical part of your data center), but also gives you a friendly interface to create a configuration.

The engineers on this project use Excel instead of YAML to configure ACI for exactly the same purpose. There are, of course, advantages to using Excel:

  • your NIP in one file
  • beautiful signs that are pleasant to look at for the client
  • you can use some excel tools

But there is one downside, and in my opinion it outweighs the pros. It becomes much more difficult to control changes and coordinate the work of the team.

ACY is actually applying the same approaches that I used for 3rd party to configure ACI.

Source: habr.com

Add a comment