Infrastructure as Code: How to overcome problems with XP

Hey Habr! I used to complain about life in the Infrastructure as code paradigm and did not offer anything to solve the current situation. Today I returned to tell you what approaches and practices will help you escape from the abyss of despair and steer the situation in the right direction.

Infrastructure as Code: How to overcome problems with XP

In a previous article "Infrastructure as code: first acquaintance" I shared my impressions of this area, tried to reflect on the current situation in this area, and even suggested that standard practices known to all developers could help. It might seem that there were many complaints about life, but there were no proposals for a way out of the current situation.

Who we are, where we are and what problems we have

We are now in the Sre Onboarding Team, which consists of six programmers and three infrastructure engineers. We all try to write Infrastructure as code (IaC). We do this because, in principle, we know how to write code and in the anamnesis we are developers of the “above average” level.

  • We have a set of advantages: a certain background, knowledge of practices, the ability to write code, a desire to learn new things.
  • And there is a sagging part, which is also a minus: a lack of knowledge on the equipment of the infrastructure.

The stack of technologies we use in our IaC.

  • Terraform for creating resources.
  • Packer for building images. These are Windows, CentOS 7 images.
  • Jsonnet to do powerful builds in drone.io as well as to generate packer json and our terraform modules.
  • Azure.
  • Ansible when cooking images.
  • Python for auxiliary services, as well as provisioning scripts.
  • And all this in VSCode with plugins shared between team members.

output from my last article I was like this: I tried to inspire (first of all in myself) optimism, I wanted to say that we will try the approaches and practices known to us in order to deal with the difficulties and difficulties that exist in this area.

We are currently struggling with the following IaC issues:

  • Imperfection of tools, means for code development.
  • Slow deployment. Infrastructure is part of the real world, and it can be slow.
  • Lack of approaches and practices.
  • We are new and don't know much.

Extreme Programming (XP) to the rescue

All developers are familiar with Extreme Programming (XP) and the practices behind it. Many of us have worked on this approach, and it has been successful. So why not take advantage of the principles and practices laid down there to overcome infrastructure difficulties? We decided to take this approach and see what happens.

Checking the Applicability of the XP Approach to Your FieldHere is a description of the environment that XP is well suited for, and how it relates to us:

1. Dynamically changing software requirements. We knew what the end goal was. But the details can vary. We ourselves decide where we need to taxi, so the requirements change periodically (mostly by ourselves). If we take the SRE team, which itself does automation, and itself limits the requirements and scope of work, then this item fits well.

2. Risks caused by fixed time projects using new technology. We may have risks when using some things unknown to us. And this is 100% our case. Our whole project is the use of technologies with which we were not fully familiar. In general, this is a constant problem, because. In infrastructure, there are a lot of new technologies emerging all the time.

3,4. Small, co-located extended development team. The technology you are using allows for automated unit and functional tests. These two points do not quite suit us. Firstly, we are not a co-located team, and secondly, there are nine of us, which can be considered a large team. Although, according to a number of definitions of a “big” team, a lot is 14+ people.

Let's look at some practices from XP and how they affect the speed and quality of feedback.

Feedback loop principle in XP

In my understanding, feedback is the answer to the question, am I doing the right thing, are we going there? In XP, there is a divine scheme for this: a feedback loop in time. The interesting thing is that the lower we are, the faster we are able to get the OS to answer the necessary questions.

Infrastructure as Code: How to overcome problems with XP

It's quite an interesting topic for discussion that we in the IT industry can quickly get an OS. Imagine how painful it is to do a project for six months and only then find out that a mistake was made at the very beginning. This happens in design, and in any construction of complex systems.

In our case, IaC helps us with feedback. I immediately make a small adjustment to the scheme above: the release plan is not a monthly cycle, but occurs several times a day. There are some practices tied to this OS cycle, which we will look at in more detail.

Important: feedback can be a solution to all the problems stated above. Combined with XP practices, it can pull you out of the abyss of despair.

How to pull yourself out of the abyss of despair: three practices

Tests

Tests are mentioned twice in the XP feedback loop. It's not just like that. They are essential to all Extreme Programming techniques.

It is assumed that you have Unit and Acceptance tests. Some give you feedback in a few minutes, others in a few days, so they take longer to write and run less often.

There is a classic testing pyramid, which shows that there should be more tests.

Infrastructure as Code: How to overcome problems with XP

How does this schema apply to us in an IaC project? Actually… not at all.

  • Unit tests, despite the fact that there should be a lot of them, there cannot be a lot. Or they are testing something very indirectly. In fact, we can say that we do not write them at all. But here are a few applications for such tests that we still managed to do:
    1. Testing code on jsonnet. This is, for example, our build pipeline in drone, which is quite complicated. The code on jsonnet is well covered by tests.
      We use this Unit testing framework for Jsonnet.
    2. Tests for scripts that are executed when the resource starts. Scripts in Python, which means that tests can be written on them.
  • It is potentially possible to check the configuration in tests, but we don't do that. It is also possible to configure the check of resource configuration rules through tflint. However, just for terraform there are too basic checks, but many check scripts are written for AWS. And we're on Azure, so that doesn't fit again.
  • Component integration tests: it depends on how you classify them and where you put them. But they basically work.

    This is what integration tests look like.

    Infrastructure as Code: How to overcome problems with XP

    This is an example when building images in Drone CI. To reach them, you have to wait 30 minutes until the Packer image is assembled, then another 15 minutes to wait until they pass. But they are!

    Image verification algorithm

    1. First, Packer must prepare the entire image.
    2. Next to the test there is a terraform with a local state, with which we deploy this image.
    3. When deployed, a small module is used, lying next to it, to make it easier to work with the image.
    4. When the VM is deployed from the image, you can start checking. Basically, checks are carried out on the machine. It is checked how the scripts worked at startup, how the daemons work. To do this, via ssh or winrm, we go to the newly raised machine and check the status of the configuration or whether the services have risen.

  • A similar situation with integration tests and modules for terraform. Here is a brief table explaining the features of such tests.

    Infrastructure as Code: How to overcome problems with XP

    Feedback on the pipeline in the region of 40 minutes. Everything takes a very long time. It can be used for regression, but for new development it is generally unrealistic. If you are very, very prepared for this, prepare running, scripts, then you can reduce it to 10 minutes. But these are still not Unit tests, which are 5 in 100 seconds.

The lack of Unit tests when building images or terraform modules encourages shifting work to separate services that can simply be pulled via REST, or to Python scripts.

For example, we needed to make sure that when the virtual machine starts, it registers itself in the service ScaleFT, and when the virtual machine was destroyed, it deleted itself.

Since we have ScaleFT as a service, we are forced to work with it through the API. A wrapper was written there, which you can pull and say: “Come in and remove this, that, that.” It stores all the necessary settings and accesses.

We can already write normal tests for this, since it does not differ in any way from ordinary software: some kind of apiha gets wet, you pull, and see what happens.

Infrastructure as Code: How to overcome problems with XP

Test results: Unit testing, which should give the OS in a minute, does not. And the types of testing that are higher in the pyramid give an effect, but close only part of the problems.

Pair programming

Tests are, of course, good. You can write a lot of them, they can be of different types. They will work at their levels and give us feedback. But the problem with bad unit tests that produce the fastest OS remains. At the same time, I still want a fast OS, it is easy and pleasant to work with it. Not to mention the quality of the resulting solution. Fortunately, there are techniques to provide even faster feedback than unit tests. This is pair programming.

When writing code, you want to get feedback on its quality as quickly as possible. Yes, you can write everything in a feature branch (so as not to break anything to anyone), make a pull request in the github, assign it to someone whose opinion matters, and wait for a response.

But you can wait a long time. People are all busy, and the answer, even if there is one, may not be of the highest quality. Suppose that the answer came immediately, the reviewer instantly understood the whole idea, but the answer still comes late, after the fact. And I want something earlier. That's pair programming and is aimed at this - so that immediately, at the time of writing.

The following are the styles of pair programming and their applicability in working on IaC:

1. Classic, Experienced+Experienced, timer change. The two roles are driver and navigator. Two people. They work on the same code and switch roles after a certain predetermined period of time.

Consider the compatibility of our problems with style:

  • Problem: imperfection of tools, means for code development.
    Negative impact: it takes longer to develop, we slow down, the pace / rhythm of work gets lost.
    How we fight: we use a different tooling, a common IDE, and we also learn shortcuts.
  • Problem: slow deployment.
    Negative impact: increases the time to create a working piece of code. We get bored while waiting, hands reach out to do something else while you wait.
    How we fight: we did not overcome.
  • Problem: lack of approaches and practices.
    Negative influence: no knowledge of how to do well and how badly. Extends feedback.
    How we fight: the exchange of opinions and practices in pair work almost solves the problem.

The main problem with applying this style in IaC is the uneven pace of work. In traditional software development, you have a very uniform movement. You can take five minutes to write N. Take 10 minutes to write 2N, 15 minutes to write 3N. Here you can spend five minutes and write N, and then spend another 30 minutes and write a tenth of N. Here you don’t know anything, you have a plug, stupid. Parsing takes time and distracts from the actual programming.

Conclusion: in its pure form, it does not suit us.

2. Ping-pong. This approach assumes that one participant writes a test, and the other makes an implementation for it. Taking into account the fact that everything is complicated with Unit tests, and you have to write an integration test that takes a long time to program, all the ease of ping-pong is gone.

I can say that we tried the separation of duties for designing a test script and implementing code for it. One participant came up with a script, in this part of the work he was responsible, he had the last word. And the other was responsible for the implementation. It worked out well. The quality of the script with this approach increases.

Conclusion: alas, the pace of work does not allow using ping-pong as a pair programming practice in IaC.

3.Strong Style. Difficult practice. The idea is that one participant becomes the directive navigator and the other takes the role of the executing driver. In this case, the right of decisions is exclusively for the navigator. The driver only prints and the word can affect what is happening. Roles do not change for a long time.

Good for learning, but requires strong soft skills. On this we stumbled. The technique was difficult. And it's not just about infrastructure.

Conclusion: potentially applicable, we are not giving up on trying.

4. Mobbing, swarming and all styles known but not listed here we do not consider, because we haven’t tried it and it’s impossible to say about it in the context of our work.

General results on the use of pair programming:

  • We have an uneven pace of work that knocks us down.
  • We ran into insufficiently good soft skills. And the subject area does not contribute to overcoming these shortcomings of ours.
  • Long tests, problems with tools make pair development viscous.

5. Despite this, there have been successes. We came up with our own method "Convergence - divergence". I will briefly describe how it works.

We have permanent partners for a few days (less than a week). We do one task together. For some time we sit together: one writes, the second sits and watches how the support team. Then we part ways for a while, each doing some independent things, then we converge again, we synchronize very quickly, do something together and part again.

Planning and communication

The last block of practices through which OS problems are solved is the organization of work with the tasks themselves. This also includes the exchange of experience, which is outside of pair work. Consider three practices:

1. Tasks through the goal tree. We organized the overall project management through a tree that stretches endlessly into the future. Technically the dribble is done in Miro. There is one task - it is an intermediate goal. From it come either smaller goals or groups of tasks. The tasks themselves are from them. All tasks are created and maintained on this board.

Infrastructure as Code: How to overcome problems with XP

This scheme also gives feedback, which happens once a day when we synchronize at rallies. Having a common plan in front of everyone, yet structured and completely open, allows everyone to be aware of what is happening and how far we have progressed.

Advantages of visual vision of tasks:

  • Causality. Each task leads to some global goal. Tasks are grouped into smaller goals. The infrastructure domain itself is quite technical. It is not always immediately clear what specific impact it has on the business, for example, writing a runbook on migrating to another nginx. Having the target card next to it makes this more clear.
    Infrastructure as Code: How to overcome problems with XP
    Causality is an important property of problems. It directly answers the question: “Am I doing this?”
  • Parallelism. There are nine of us, and it is simply physically impossible to attack everyone on one task. Tasks from one area may not always be enough either. We are forced to parallel work between small working groups. At the same time, groups sit on their task for some time, they can be strengthened by someone else. From this working group people sometimes fall off. Someone goes on vacation, someone makes a report for the DevOps conf conference, someone writes an article on Habr. Knowing what goals and tasks can be done in parallel becomes very important.

2. Interchangeable presenters of morning rallies. At stand-ups, such a problem turned out - people do a lot of tasks in parallel. Sometimes the tasks are loosely connected and there is no understanding of who is doing what. And the opinion of another member of the team is very important. This is additional information that can change the course of solving the problem. Of course, usually there is someone in a pair with you, but advice and tips are always not superfluous.

To improve this situation, we applied the technique of "Change of the leading stand-up". Now they are rotated according to a certain list, and this has its effect. When it's your turn, you have to dive in and understand what's going on in order to have a good scrum rally.

Infrastructure as Code: How to overcome problems with XP

3. Internal demo. Problem solving help from pair programming, task tree visualization, and help at scrum meetings in the morning are good, but not ideal. As a couple, you are limited only by your knowledge. The task tree helps to understand globally who is doing what. And the host and colleagues at the morning meeting will not dive deep into your problems. Surely they might miss something.

The solution was found in demonstrating the work done to each other and then discussing it. We meet once a week for an hour and show the details of the solutions to the tasks that we have done over the past week.

During the demonstration, it is necessary to reveal the details of the task and be sure to demonstrate its operation.

The report can be kept according to the checklist.1. Enter in context. Where did the task come from, why was it needed at all?

2. How was the problem solved before? For example, mass mouse clicking was required, or it was impossible to do something at all.

3. How we improve it. For example: "Look, now there is a script, here is a readme."

4. Show how it works. It is desirable to directly implement any user script. I want X, I do Y, I see Y (or Z). For example, deploy NGINX, smoke url, get 200 OK. If the action is long, prepare in advance to show later. It is advisable not to break especially if it is fragile an hour before the demo.

5. Explain how successfully the problem is solved, what difficulties remain, what is not completed, what improvements are possible in the future. For example, now cli, then there will be full automation in CI.

It is desirable for each speaker to keep within 5-10 minutes. If your presentation is obviously important and will take longer, please agree it in advance in the sre-takeover channel.

After the face-to-face part, there is always a discussion in the thread. This is where the feedback we need on our tasks appears.

Infrastructure as Code: How to overcome problems with XP
As a result, a survey is conducted to identify the usefulness of what is happening. This is already feedback on the essence of the speech and the importance of the task.

Infrastructure as Code: How to overcome problems with XP

Long conclusions and what's next

It may seem that the tone of the article is somewhat pessimistic. This is wrong. The two lower levels of feedback, quizzes and pair programming, work. Not as perfect as in traditional development, but there is a positive effect from this.

Tests, in their current form, provide only partial code coverage. Many configuration functions are not tested. Their influence on the direct work when writing code is low. However, there is an effect of integration tests, and it is they that allow you to fearlessly carry out refactorings. This is a big achievement. Also, with the transfer of focus to development in high-level languages ​​​​(we have python, go), the problem goes away. And you don’t need a lot of checks for “glue”, a general integration one is enough.

Working in pairs is more dependent on specific people. There is a task factor and our soft skills. Some are very good, some are worse. There is definitely a benefit to this. It is clear that even with insufficient compliance with the rules of pair work, the very fact of joint performance of tasks has a positive effect on the quality of the result. Personally, I find it easier and more pleasant to work in pairs.

Higher-level ways to influence the OS - planning and working with tasks exactly give effects: a quality exchange of knowledge and an improvement in the quality of development.

Short conclusions in one line

  • XP practices work in IaC, but with less efficiency.
  • Strengthen what works.
  • Come up with your own compensatory mechanisms and practices.

Source: habr.com

Add a comment