Programmers, Devops and Schrödinger's Cats

Programmers, Devops and Schrödinger's Cats
The reality of a network engineer (with noodles and… salt?)

Recently, when discussing various incidents with engineers, I noticed an interesting pattern.

In these discussions, the question of “root cause” invariably arises. Loyal readers will surely know what I have some thoughts by this about. In many organizations, incident analysis is entirely based on this concept. They use different techniques to identify cause and effect relationships, such as "Five Whys". These methods assume the so-called "linearity of events" as an undeniable dogma.

When you question this idea and point out that linearity is reassuringly deceptive in complex systems, a fascinating discussion is born. The debaters passionately insist that only knowledge of the “first cause” allows us to understand what is happening.

I noticed an interesting pattern: developers and devops react differently to this idea. In my experience, developers are more likely to claim that root causes matter and that events can always be traced back to cause and effect. On the other hand, devops are more likely to agree that a complex world is not always linear.

I've always wondered why is that? What makes programmers so criticize the idea of ​​"the root cause is a myth"? Like an immune system that recognizes a foreign agent. Why do they react like that while devops rather inclined consider this idea?

I'm not entirely sure, but I have an idea about this. It is related to the different contexts in which these professionals perform their daily work.

Developers often work with deterministic tools. Of course, compilers, linkers, operating systems are all complex systems, but we are used to the fact that they give a deterministic result, and we represent them as deterministic: if we provide the same input data, then we usually expect the same output from these systems . And if there is a problem with the issuance (“bug”), then the developers solve it by analyzing the input data (either from the user or from a set of tools in the development process). They look for the "error" and then change the input. This fixes the "bug".

Programmers, Devops and Schrödinger's Cats
Basic assumption of software development: the same inputs reliably and deterministically produce the same output

In fact, a non-deterministic result is itself considered an error: if an unexpected or erroneous output is not reproduced, then developers tend to expand the study to other parts of the stack (operating system, network, etc.), which also behave more or less deterministically, producing the same result for the same input... and if it's notit's still considered a bug. It's just that now it's an operating system or network bug.

In any case, determinism is a basic, almost self-evident assumption for much of the work that programmers do.

But for any devops who's spent the day building racks of hardware or tinkering with cloud APIs, the idea of ​​a fully deterministic world (as long as it's possible to map all inputs at all!) is a fleeting concept at best. Even if we put aside BOHF jokes about sunspots, experienced engineers have seen the strangest things in this world. They know that even human screaming can slow down the servernot to mention the millions of other factors in the environment.

So it’s easier for experienced engineers to doubt that all incidents have a single root cause, and techniques like “Five Whys” will correctly (and repeatably!) lead to this root cause. In fact, this is contrary to their own experience, where the pieces of the puzzle do not fit together so neatly in practice. Therefore, they are easier to perceive this idea.

Of course, I'm not saying that developers are naive, stupid, or unable to understand how linearity can be deceiving. Experienced programmers have probably also seen a lot of non-determinism in their lifetime.

But it seems to me that the usual reaction from developers in these debates is often related to the fact that the concept of determinism generally serves them well in daily work. They don't encounter nondeterminism as often as engineers have to catch Schrödinger's cats on their infrastructure.

This may not fully explain the observed developer reactions, but it is a powerful reminder that our reactions are a complex mixture of many factors.

It's important to keep this complexity in mind, whether we're dealing with a single incident, collaborating on a software delivery pipeline, or trying to make sense of the wider world.

Source: habr.com

Add a comment