Once again about DevOps and SRE

Based on a chat discussion AWS Minsk Community

Recently, real battles have flared up over the definition of DevOps and SRE.
Despite the fact that in many ways discussions on this topic have already set my teeth on edge, including me, I decided to bring my view on this topic to the court of the Habra community. For those who are interested, welcome to cat. And let everything begin anew!

prehistory

So, in ancient times, a team of software developers and server administrators lived separately. The first successfully wrote the code, the second, using various warm, affectionate words addressed to the first, set up the servers, periodically coming to the developers and receiving in response a comprehensive “everything works on my machine.” The business was waiting for the software, everything was idle, it broke periodically, everyone was nervous. Especially the one who paid for this whole mess. Glorious lamp era. Well, you already know where DevOps comes from.

The birth of DevOps practices

Then serious guys came and said - this is not an industry, you can’t work like that. And they brought in life cycle models. Here, for example, is the V-model.

Once again about DevOps and SRE
So what do we see? A business comes with a concept, architects design solutions, developers write code, and then failure. Someone somehow tests the product, someone somehow delivers it to the end user, and somewhere at the output of this miracle model sits a lonely business customer waiting for the promised weather by the sea. We came to the conclusion that we need methods that will allow us to establish this process. And we decided to create practices that would implement them.

A lyrical digression on the subject of what practice is
By practice I mean a combination of technology and discipline. An example is the practice of describing infrastructure using terraform code. Discipline is how to describe infrastructure with code, it is in the developer’s head, and technology is the terraform itself.

And they decided to call them DevOps practices - I think they meant from Development to Operations. We came up with various clever things - CI/CD practices, practices based on the IaC principle, thousands of them. And off we go, developers write code, DevOps engineers transform the description of the system in the form of code into working systems (yes, the code is, unfortunately, just a description, but not the embodiment of the system), delivery continues, and so on. Yesterday's administrators, having mastered new practices, proudly retrained as DevOps engineers, and everything went from there. And there was evening, and there was morning... sorry, not from there.

It's not all good again, thank God

As soon as everything calmed down, and various cunning “methodologists” began to write thick books on DevOps practices, disputes quietly flared up about who the notorious DevOps engineer was and that DevOps is a production culture, discontent arose again. Suddenly it turned out that software delivery is an absolutely non-trivial task. Each development infrastructure has its own stack, somewhere you need to assemble it, somewhere you need to deploy the environment, here you need Tomcat, here you need a cunning and complicated way to launch it - in general, your head is pounding. And the problem, oddly enough, turned out to be primarily in the organization of processes - this delivery function, like a bottleneck, began to block processes. In addition, no one canceled Operations. It is not visible in the V-model, but there is still the entire life cycle on the right. As a result, it is necessary to somehow maintain the infrastructure, monitor monitoring, resolve incidents, and also deal with delivery. Those. sit with one foot in both development and operation - and suddenly it turned out to be Development & Operations. And then there was the general hype for microservices. And with them, development from local machines began to move to the cloud - try to debug something locally, if there are dozens and hundreds of microservices, then constant delivery becomes a means of survival. For a “small modest company” it was all right, but still? What about Google?

SRE by Google

Google came, ate the largest cacti and decided - we don’t need this, we need reliability. And reliability must be managed. And I decided that we need specialists who will manage reliability. I called them SR engineers and said, that’s it for you, do it well as usual. Here's SLI, here's SLO, here's monitoring. And he poked his nose into operations. And he called his “reliable DevOps” SRE. Everything seems to be fine, but there is one dirty hack that Google could afford - for the position of SR engineers, hire people who were qualified developers and also did a little homework and understood the functioning of working systems. Moreover, Google itself has problems with hiring such people - mainly because here it competes with itself - it is necessary to describe the business logic to someone. Delivery was assigned to release engineers, SR - engineers manage reliability (of course, not directly, but by influencing the infrastructure, changing the architecture, tracking changes and indicators, dealing with incidents). Nice, you can write books. But what if you are not Google, but reliability is still somehow a concern?

Development of DevOps ideas

Just then Docker arrived, which grew out of lxc, and then various orchestration systems such as Docker Swarm and Kubernetes, and DevOps engineers exhaled - the unification of practices simplified delivery. It simplified it to such an extent that it became possible to even outsource delivery to developers - what is deployment.yaml. Containerization solves the problem. And the maturity of CI/CD systems is already at the level of writing one file and off we go - the developers can handle it themselves. And then we start talking about how we can make our own SRE, with... or at least with someone.

SRE is not on Google

Well, ok, we delivered the delivery, it seems we can exhale, return to the good old days, when admins watched the processor load, tuned the systems and quietly sipped something incomprehensible from mugs in peace and quiet... Stop. This is not why we started everything (which is a pity!). Suddenly it turns out that in Google’s approach we can easily adopt excellent practices - it’s not the processor load that is important, and not how often we change the disks there, or optimize the cost in the cloud, but the business metrics are the same notorious SLx. And no one has removed infrastructure management from them, and they need to resolve incidents, and be on duty periodically, and generally stay on top of business processes. And guys, start programming little by little at a good level, Google is already waiting for you.

To summarize. Suddenly, but you are already tired of reading and you can’t wait to spit and write to the author in a comment on the article. DevOps as a delivery practice has always been and will be. And it's not going anywhere. SRE as a set of operational practices makes this very delivery successful.

Source: habr.com

Add a comment