Developers are from Mars, Admins are from Venus

Developers are from Mars, Admins are from Venus

Coincidences are random, and indeed it was on another planet ...

I want to share three stories of success and failure about how a backend developer works in a team with admins.

The first story.
Web studio, the number of employees can be counted with one hand. Today you are a coder, tomorrow you are a backender, the day after tomorrow you are an admin. On the one hand, you can get a tremendous experience. On the other hand, there is a lack of competence in all areas. I still remember the first working day, I'm still green, the boss says: "Open putty", but I don't know what it is. Communication with admins is excluded, because. you are the admin. Consider the pros and cons of this position.

+ All power is in your hands.
+ No need to beg anyone for access from the server.
+ Fast reaction time across the board.
+ Well pumps skills.
+ There is a complete understanding of the product architecture.

- High responsibility.
- The risk of breaking production.
— It is difficult to be a good specialist in all areas.

Not interested, let's move on.

The second story.
Big company, big project. There is an administration department with 5-7 employees and several development teams. When you come to work in such a company, every admin thinks that you came here not to work on a product, but to break something. Neither the signed NDA, nor the selection at the interview says otherwise. No, this man came here with his dirty hands to ruin our kissing production. Therefore, with such a person you need a minimum of communication, you can throw a sticker at the extreme in response. Do not answer questions about the architecture of the project. It is advisable not to give access until the team leader asks. And when asked, to issue with even fewer privileges than requested. Almost all communication with such admins is swallowed up by a black hole between the development department and the administration department. Issues cannot be resolved quickly. And you can’t approach in person - the admins are too busy 24/7. (What are you doing all the time?) Some performance characteristics:

  • Average deployment time to production 4-5h
  • Maximum deployment time to production 9h
  • For a developer, an application in production is a black box, just like the production server itself. And how many of them in general?
  • Poor release quality, frequent errors
  • The developer is not involved in the release process

Well, what did I expect, of course, newcomers are not allowed into production. Well, okay, with patience, we begin to gain the trust of others. But for some reason, it's not so simple with admins.

Act 1. The admin is invisible.
Release day, developer and admin do not communicate. The admin has no questions. But why understand later. The admin is a principled person, does not have instant messengers, does not give a phone number to anyone, does not have a profile in social networks. Yes, there is not even a photo of him anywhere, how do you look dude? We sit with the responsible manager for about 15 minutes in bewilderment, trying to establish contact with this Voyager 1, then a message falls on the corporate mail that he has finished. Are we going to correspond by mail? Why not? Convenient, isn't it? Okay, let's cool down. The process is already underway, there is no turning back. We read the message again. "I finished". What did you finish? Where? Where to look for you? Here you understand why 4 hours per release is normal. We get a development shock, but we finish the release. There is no more desire to release.

Act 2. Wrong version.
Next release. Having gained experience, we begin to form lists of the necessary software and libraries for the server for admins, indicating versions for some. As always, we get a weak radio signal that the admin has finished something there. The regression test begins, which itself takes about an hour. Everything seems to be working, but there is one critical bug. Important functionality is not working. The next few hours are dancing with tambourines, divination on coffee grounds, a detailed review of each piece of code. Admin says he did it. The application written by krivorukovy developers does not work, and the server works. What are his questions. At the end of some hour, we still get the admin to drop the version of the library on the production server and bingo into the chat - it's not what we need. We ask the admin to install the required version, in response we get that he cannot do this due to the absence of this version in the OS package manager. Here, from the bins of memory, the manager recalls that another admin has already solved this problem, simply by collecting the desired version with his hands. But no, we won't do that. The regulation forbids. Carl, we have been sitting for several hours already, what are the regulations?! We get another shock, the release is somehow finishing.

Act 3, short
Urgent ticket, key functionality does not work for one of the users in production. Poke a couple of hours, check. In a development environment, everything works. There is a clear understanding that it would be nice to look into the php-fpm logs. There was no logging system like ELK or Prometheus at that time on the project. We open a ticket to the administration department so that they give access to the php-fpm logs to the server. Here you need to understand that we are not asking for access easily, do you remember about the black hole and the busyness of admins 24/7? If you ask them to look at the logs themselves, then this is a task with a priority of "not in this life." The ticket is created, we get an instant response from the head of the administration department: “You should not need access to the logs in production, write without bugs.” A curtain.

Act 4 and beyond
We are collecting a dozen more problems in production, due to different versions of libraries, not configured software, unprepared for server loads and other problems. Code bugs, of course, also happen, we will not blame the admins for all the sins, we will only mention one more typical operation for that project. We had a lot of background workers that were launched through the supervisor, and some scripts had to be added to cron. Sometimes these same workers stopped working. On the queue server, the load grew at lightning speed, and sad users looked at the spinning loder. For a quick fix, it was enough just to restart such workers, but again, only the admin could do this. While such an elementary operation was performed, a whole day could pass. Here, of course, it is worth noting that crooked programmers should write workers so that they do not fall, but when they fall, it would be nice to understand why, which is sometimes impossible due to the lack of access to production, of course, and as a result, the lack of developer logs.

Transformations.
Having endured all this for quite a long time, together with the team, we began to steer in a more successful direction for us. In summary, what were the challenges we faced?

  • Lack of high-quality communication between developers and the administration department
  • Admins, it turns out (!), do not understand at all how the application works, what dependencies it has and how it works.
  • Developers do not understand how the production environment works and, as a result, cannot effectively respond to problems.
  • The deployment process takes too long.
  • Unstable releases.

What have we done?
For each release, a list of Release Notes was formed, which included a list of work that needed to be done on the server for the next release to work. The list contained several sections, the work that must be done by the administrator responsible for the release and the developer. Developers got access (not root) to all production servers, which accelerated development in general and problem solving in particular. Also, the developers got an understanding of how production works, what services it is divided into, where and how much replicas cost. From a part, combat loads have become more understandable, which undoubtedly affects the quality of the code. Communication during the release process took place in the chat of one of the messengers. Firstly, we had a log of all actions, and secondly, communication took place in a closer environment. Having a history of actions more than once allowed new employees to solve problems faster. It's a paradox, but this often helped the admins themselves. I will not undertake to say for sure, but it seems to me that the admins have begun to understand more how the project works, how it is written. Sometimes we even shared some details with each other. The average release time was reduced to an hour. Sometimes we fit in 30-40 minutes. The number of bugs has been reduced by several times, if not dozens of times. Of course, other factors, such as autotests, also influenced the reduction in release time. After each release, we started doing retrospectives. So that the whole team has an idea of ​​what is new, what has changed, and what has been removed. Unfortunately, the admins did not always come to them, well, the admins are busy... As a developer, my job satisfaction has undoubtedly increased. When you can quickly solve almost any problem that is in your area of ​​​​competence, you feel like a horse. Later, I will realize that we have introduced DevOps culture to some extent, not completely of course, but even that beginning of the transformation was impressive.

History of the third
Startup. One admin, small development department. Upon arrival, I am a complete zero, because except from the mail access I have nowhere. We write to the administrator, we ask to give access. In addition, there is information that he is aware of the new employee and the need to issue logins / passwords. They give access from the repository and vpn. Why give access to wiki, teamcity, rundesk? Useless things for a person who was called to write the entire back-end part. Only with time do we gain access to some tools. The arrival, of course, was met with disbelief. I'm trying to slowly feel out how the project infrastructure works through chats and leading questions. Basically I don't know anything. Production is the same black box as before. But more than that, there is even a black box of stage servers used for testing. In addition to deploying a branch from the git there, we can do nothing. Also, we cannot configure our application like .env files. Access for such operations is not allowed. You need to engage in begging so that you change the line in the config of your application on the test server. (There is a theory that it is vital for admins to feel themselves important on the project, if they are not asked to change lines in the configs, they simply will not be needed). Well, as always, isn't it convenient? This quickly gets boring, after a direct conversation with the admin, we find out that the developers were born to write bad code, by nature they are incompetent personalities and it is better to keep them away from production. But here also from test servers, just in case. The conflict is rapidly escalating. There is no communication with the admin. The situation is aggravated by the fact that he is alone. Below is a typical picture. Release. Certain functionality does not work. We figure out what's going on for a long time, various ideas from developers are thrown into the chat, but the admin in such a situation usually assumes that the developers are to blame. Then he writes in the chat, wait, I corrected. When asked to leave a story behind with information about what the problem was, we get toxic excuses. Don't stick your nose where it doesn't belong. Developers must write code. The situation when many body movements in the project go through one single person and only he has access to perform the operations that everyone needs is extremely sad. Such a person is a terrible bottleneck. If Devops ideas aim to reduce time-to-market, then such people are the worst enemy of devops ideas. Unfortunately, the curtain is closing here.

PS Having talked a little about developers vs admins in chats with people, I met people who shared my pain. But there were also those who said that they had not encountered such a thing. At one devops conference, I asked Anton Isanin (Alfa-Bank) how they dealt with the bottleneck problem in the form of admins, to which he said: “We replaced them with buttons.” By the way podcast with his participation. I would like to believe that there are much more good admins than enemies. And yes, the picture at the beginning is a real correspondence.

Source: www.habr.com

Add a comment