Beginning system administrator: how to make order out of chaos

Beginning system administrator: how to make order out of chaos

I am a FirstVDS system administrator, and this is the text of the first introductory lecture from my short course on helping novice colleagues. Professionals who are new to system administration face a number of the same problems. To offer solutions, I undertook to write this series of lectures. Some things in it are specific to hosting technical support, but in general, they can be useful, if not for everyone, then for many. Therefore, I have adapted the text of the lecture to share here.

It does not matter what your position is called - it is important that in fact you are in administration. Therefore, let's start with what a system administrator should do. Its main task is to put in order, maintain order and prepare for future increases in order. Without a system administrator, a mess begins on the server. Logs are not written, or the wrong thing is written in them, resources are not optimally distributed, the disk is filled with all sorts of garbage, and the system begins to slowly bend from such an amount of chaos. Calmly! System administrators in your person begin to solve problems and clean up the mess!

Pillars of system administration

However, before you start solving problems, it is worth getting acquainted with the four main pillars of administration:

  1. Documentation
  2. templating
  3. optimization
  4. Automation

This is the foundation of the basics. If you do not build your workflow on these principles, it will be inefficient, unproductive, and generally bear little resemblance to real administration. Let's deal with each separately.

Documentation

Documentation means by itself not reading the documentation (although without it nowhere), but also maintaining.

How to keep documentation:

  • Faced with a new problem that you have never seen before? Write down the main symptoms, methods of diagnosis and principles of elimination.
  • Have you come up with a new elegant solution to a typical problem? Write it down so you don't have to reinvent it in a month.
  • Have you been helped to deal with a question in which you did not understand anything? Write down the main theses and concepts, draw yourself a diagram.

The main idea: you should not completely trust your own memory when mastering and applying new things.

In what format you will do this is up to you: it can be a system with notes, a personal blog, a text file, a physical notepad. The main thing is that your records meet the following requirements:

  1. Don't be unnecessarily long. Highlight the main ideas, methods and means. If understanding a problem requires diving into the low-level mechanics of how memory allocation works in Linux, don't rewrite the article you learned it from - link to it.
  2. The entries should be understandable to you. If the line race cond.lockup does not allow you to immediately understand what you described with this line - explain. Good documentation doesn't take half an hour to figure out.
  3. Search is a very good feature. If you're blogging, add tags; if in a physical notebook, stick small post-it with descriptions. There is not much point in documentation if you spend as much time searching for an answer in it as you would spend solving a question from scratch.

Beginning system administrator: how to make order out of chaos

This is what documentation can look like: from primitive notes in a notebook (picture above), to a full-fledged multi-user knowledge base with tags, search and all possible conveniences (below).

Beginning system administrator: how to make order out of chaos

Not only will you not have to look for the same answers twice, but documentation will be a great help in learning new topics (notes, yes!), pump up your spider sense (the ability to diagnose a complex problem with one superficial look), add organization to your actions. If the documentation is available to your colleagues, it will allow them to figure out what and how you piled up there when you are not there.

templating

templating is the creation and use of templates. To solve most typical questions, it is worth creating a specific action template. Most problems should be diagnosed using a standardized workflow. When you have fixed/installed/optimized something, the performance of this something should be checked against standardized checklists.

Templates are the best way to organize your workflow. By using standard procedures to solve the most common problems, you get a lot of cool stuff. For example, the use of checklists will allow you to diagnose all the functions that are important for the operation and discard the diagnostics of unimportant functionality. And standardized procedures will minimize unnecessary tossing and reduce the chance of error.

The first important point is that procedures and checklists also need to be documented. If you just rely on memory, you can skip some really important check or operation and ruin everything. The second important point is that all template practices can and should be modified if the situation requires it. There are no ideal and absolutely universal patterns. If there is a problem, but the template check did not reveal it, this does not mean that there is no problem. However, before attempting to test some unlikely hypothetical problem, it's always a good idea to do a quick pattern check first.

Optimization

Optimization speaks for itself. The workflow should be optimized as much as possible in terms of time and labor costs. There are countless options here: learn hotkeys, abbreviations, regular expressions, available tools. Look for more practical ways to use these tools. If you call a command 100 times a day, put it on a keyboard shortcut. If you need to connect to the same servers regularly, write an alias in one word that will connect you there:

Beginning system administrator: how to make order out of chaos

Familiarize yourself with the different options for available tools - perhaps there is a more convenient terminal client, DE, clipboard manager, browser, email client, operating system. Find out what tools your colleagues and acquaintances use - maybe they choose them for a reason. After you pick up the tools, learn how to use them: learn the keys, abbreviations, tips and tricks.

Optimally use standard tools - coreutils, vim, regular expressions, bash. For the last three, there are a huge number of wonderful manuals and documentation. With their help, you can quickly go from the state of "I feel like a monkey cracking nuts with a laptop" to "I'm a monkey who uses a laptop to order a nutcracker."

Automation

Automation will transfer heavy operations from our tired hands to the tireless hands of automation. If some standard procedure is executed in five commands of the same type, then why not wrap all these commands in one file and call one command that downloads and executes this file?

Automation itself is 80% about writing and optimizing your own tools (and another 20% from trying to make them work as they should). It can be just an advanced one-liner or a huge all-powerful tool with a web interface and API. The main criterion here is that creating a tool should take no more time and effort than the amount of time and effort that this tool will save you. If you spend five hours writing a script that you will never need again, for a task that would take you an hour or two to solve without a script, this is a very poor workflow optimization. You can spend five hours creating a tool only if the number, type of tasks and time allow it, which is rare.

Automation does not necessarily mean writing full-fledged scripts. For example, to create a bunch of objects of the same type from a list, a clever one-liner is enough, which will automatically do what you would do with your hands, switching between windows, with heaps of copy-paste.

Actually, if you build the administration process on these four pillars, you can quickly increase your efficiency, productivity and qualifications. However, this list needs to be supplemented with one more item, without which work in IT is almost impossible - self-education.

Self-education of a system administrator

To be even slightly competent in this area, you need to constantly study and learn new things. If you do not have the slightest desire to face the unknown and figure it out, you will very quickly "sag". All sorts of new solutions, technologies and methods are constantly appearing in IT, and if you do not study them at least superficially, you are on the way to losing. Many areas of information technology stand on a very complex and voluminous basis. For example, networking. Networks and the Internet are everywhere, you come across them every day, but once you dig into the technologies behind them, you will find a huge and very complex discipline, the study of which is never a walk in the park.

I did not include this item in the list, because it is key for IT in general, and not just for system administration. Naturally, you won’t be able to learn absolutely everything right away - you just physically don’t have enough time. Therefore, when self-education, one should remember about the necessary levels of abstraction.

You don’t have to immediately learn how the internal memory management of each individual utility works, and how it interacts with Linux memory management, but it’s good to know what RAM is schematically and why it is needed. You don't need to know how the headers of TCP and UDP are structurally different, but it would be nice to understand the basic differences between the protocols at work. You don't need to learn what optical attenuation is, but it would be nice to know why true loss is always inherited across nodes. There is nothing wrong with knowing how certain elements work at a certain level of abstraction and not necessarily dismantling absolutely all levels when there is no abstraction at all (you will just go crazy).

However, in your field, to argue at the level of abstraction β€œwell, this is such a thing that allows you to show sites” is not very good. The following lectures will be devoted to an overview of the main areas that the system administrator has to deal with in working at lower levels of abstraction. I will try to limit the amount of knowledge reviewed to a minimum level of abstraction.

10 commandments of system administration

So, we have learned the four main pillars and the foundation. Can we start solving problems? Not yet. Before that, it is advisable to familiarize yourself with the so-called "best practices" and good manners. Without them, there is a chance that you will do more harm than good. So, let's begin:

  1. Some of my colleagues believe that the very first rule is β€œdo no harm”. But I tend to disagree. When you try not to harm, you can’t do anything - too many actions are potentially destructive. I think the most important rule is "make a backup". Even if you hurt, you can always roll back, and everything will not be so bad.

    Backup should always be done when time and place allow it. You need to back up what you will change and what you risk losing during a potentially destructive action. It is advisable to check the backup for integrity and availability of all the necessary data. The backup should not be deleted immediately after you have checked everything, if you do not need to free up disk space. If the place requires it, back it up to your personal server and delete it after a week.

  2. The second most important rule (which I often break myself) is "do not hide". If you have made a backup, write where so that your colleagues do not have to look for it. If you have done some non-obvious or complex actions, write it down: you will go home, and the problem may repeat or arise for someone else and your solution will be found by keywords. Even if you are doing something that you know well, your colleagues may not know it.
  3. The third rule does not need to be explained: β€œnever do something the consequences of which you don’t know, can’t imagine, or don’t understand”. Don't copy commands from the internet if you don't know what they do, call man and parse first. Do not use ready-made solutions if you cannot understand what they do. Keep obfuscated code execution to an absolute minimum. If there is no time to understand, then you are doing something wrong and you should read the next paragraph.
  4. "Test". New scripts, tools, one-liners, and commands should be tested in a controlled environment, not on the client machine, if there is at least a minimal potential for destructive actions. Even if you backed up everything (and you did), downtime is not the coolest thing. Get a separate server/virtual/chroot for this business and test there. Nothing broken? Then you can run on the "combat".

    Beginning system administrator: how to make order out of chaos

  5. "Control". Minimize all transactions that you do not control. One crooked dependency on a package can take half the system with it, and the -y flag for yum remove gives you the opportunity to practice your system recovery skills from scratch. If the action has no uncontrolled alternatives - the next item and a ready backup.
  6. "Check". Check the consequences of your actions and whether you need to roll back to a backup. Check if the problem is really solved. Check if the error is reproduced and under what conditions. Check what you can break with your actions. It is superfluous to trust in our work, but never to verify.
  7. "Communicate". If you can't solve the problem, ask your colleagues if they have experienced this. If you want to apply a controversial decision - find out the opinion of colleagues. Perhaps they will come up with a better solution. There is no confidence in your actions - discuss them with colleagues. Even if this is your area of ​​expertise, a fresh look at the situation can clear up a lot. Don't be ashamed of your own ignorance. It is better to ask a stupid question, look like a fool and get an answer than not ask this question, not get an answer and be a fool.
  8. "Do not refuse help unreasonably". This point is the other side of the previous one. If you are asked a stupid question, clarify and explain. They ask for the impossible - explain that it is impossible and why, offer alternatives. If there is no time (really no time, not desire) - say that you have an urgent issue, a large amount of work, but you will figure it out later. If colleagues do not have urgent tasks, offer to contact them and delegate the issue.
  9. "Come on feedback". Have a colleague started using a new technique or a new script, and you are facing the negative consequences of this decision? Report it. Perhaps the problem is solved in three lines of code or five minutes of refining the technique. Found a bug in the software? Report a bug. If it plays or doesn't need to be played, it will likely get fixed. Voice wishes, suggestions and constructive criticism, bring questions for discussion if it seems that they are relevant.
  10. "Ask for feedback". We are all imperfect, and so are our decisions, and the best way to check if your decision is correct is to bring it up for discussion. We optimized something at the client - ask to follow the work, maybe the β€œbottleneck” of the system is not where you were looking for. We wrote a help script - show your colleagues, maybe they will find a way to improve it.

If you constantly apply these practices in your work, most of the problems cease to be problems: you will not only reduce the number of your own errors and fakups to a minimum, but you will also have the opportunity to correct errors (in the form of backups and colleagues who will advise you to backup). Further - only technical details, in which, as you know, the devil lies.

The main tools that you will have to work with more than 50% of the time are grep and vim. What could be easier? Text search and text editing. However, both grep and vim are powerful multi-tools that allow you to search and edit text efficiently. If some Windows notepad allows you to simply write / delete a line, then in vim you can do almost anything with the text. If you don’t believe me, call the vimtutor command from the terminal and start learning. As for grep, its main strength is in regular expressions. Yes, the tool itself allows you to quite flexibly set search conditions and output data, but without RegExp it doesn't make much sense. And you need to know regular expressions! At least at a basic level. For starters, I would suggest you watch this video, it covers the basics of regular expressions and their use in conjunction with grep. Oh yes, when combined with vim, you get the ULTIMATE POWER to do things with text that you have to hang them with 18+ icons.

Of the remaining 50%, 40% comes from the coreutils toolkit. For coreutils you can look at the list in wikipedia, and the manual for the entire list is on the site GNU. What is not covered by this set is in the utilities POSIX. It is not necessary to memorize this with all the keys by heart, but it is useful to at least roughly know what the main tools can do. You don't have to reinvent the wheel out of crutches. I somehow had to replace line breaks with spaces in the output from some utility, and a sick brain gave birth to a construction like sed ':a;N;$!ba;s/n/ /g', a colleague who came up drove me away from the console with a broom, and then solved the problem by writing tr 'n' ' '.

Beginning system administrator: how to make order out of chaos

I would advise you to remember what each individual tool does and the keys to the most frequently used commands, for everything else there is man. Feel free to call man if you are in any doubt. And be sure to read man on man itself - it contains important information about what you find.

Knowing these tools, you will be able to effectively solve a significant part of the problems that you will encounter in practice. In the following lectures, we'll look at when to apply these tools and the frameworks for the underlying services and applications they apply to.

FirstVDS system administrator Kirill Tsvetkov was with you.

Source: habr.com

Add a comment