Ansible basics, without which your playbooks are a lump of sticky pasta

I do a lot of reviews for other people's Ansible code and write a lot myself. In the course of analyzing the mistakes (both others' and my own), as well as a certain number of interviews, I realized the main mistake that Ansible users make - they climb into the complex without mastering the basic one.

To correct this universal injustice, I decided to write an introduction to Ansible for those who already know it. I warn you, this is not a retelling of mana, this is a long read in which there are many letters and no pictures.

The expected level of the reader is that several thousand lines of yaml have already been written, something is already in production, but "somehow everything is crooked."

Titles

The main mistake of an Ansible user is not to know what is called. If you do not know the names, you cannot understand what is written in the documentation. A living example: at an interview, a person who seemed to have stated that he wrote a lot in Ansible could not answer the question "what elements does a playbook consist of?". And when I suggested that "the answer was expected that the playbook consists of play", then the killer comment "we do not use it" followed. People write Ansible for money and don't use play. They actually use it, but they don't know what it is.

So let's start with a simple one: what is it called. Maybe you know this, or maybe you don't, because you didn't pay attention when you read the documentation.

ansible-playbook executes the playbook. Playbook is a yml/yaml file with something like this inside:

---
- hosts: group1
  roles:
    - role1

- hosts: group2,group3
  tasks:
    - debug:

We have already understood that this entire file is a playbook. We can show where the roles are, where the tasks are. But where is play? And how is play different from role or playbook?

It's all in the documentation. And it is skipped. Beginners - because there are too many and you can’t remember everything at once. Experienced - because "trivial things." If you are experienced, re-read these pages at least once every half a year, and your code will become a class better.

So remember: a playbook is a list of play and import_playbook.
This is one play:

- hosts: group1
  roles:
    - role1

and this is also another play:

- hosts: group2,group3
  tasks:
    - debug:

What is play? Why is she?

Play is a key element for a playbook because play and only play binds a list of roles and/or tasks to a list of hosts on which to execute them. In the deep bowels of the documentation, you can find a mention of delegate_to, local lookup plugins, network-cli-specific settings, jump hosts, etc. They allow you to slightly change the place of execution of tasks. But, forget about it. Each of these tricky options has very specific uses, and they are definitely not universal. And we are talking about basic things that everyone should know and use.

If you want to perform "something" "somewhere" - you write play. Not a role. Not a role with modules and delegates. You take and write play. In which, in the hosts field, you list where to execute, and in roles/tasks - what to execute.

Simply, right? And how could it be otherwise?

One of the characteristic moments when people have a desire to do it not through play is "the role that sets everything up." I would like to have a role that configures both servers of the first type and servers of the second type.

An archetypal example is monitoring. I would like to have a monitoring role that will set up monitoring. The monitoring role is assigned to monitoring hosts (acc. play). But, it turns out that in order to monitor, we need to put packets on the hosts that we monitor. Why not use delegate? You also need to configure iptables. delegate? And still it is necessary to write/correct a config for a DBMS that monitoring started up. delegate! And if the creative popped up, then you can make a delegation include_role in a nested loop through a tricky filter on the list of groups, and inside include_role can you do more delegate_to again. And off we go...

A good wish - to have a single monitoring role that "does everything" - leads us to a pitch hell from which there is most often one way out: rewrite everything from scratch.

Where did the error happen? The moment you discovered that to perform task "x" on host X you had to go to host Y and do "y" there, you should have done a simple exercise: go and write a play that does y on host Y. Do not add something to "x", but write from scratch. Even with hardcoded variables.

Everything seems to be correct in the paragraphs above. But this is not your case! Because you want to write reusable code that is DRY and library-like, and you need to find a way to do it.

Here is another big mistake. A mistake that turned many projects from tolerably written (could be better, but everything works and is easy to add) to a complete horror that even the author cannot figure out. It works, but God forbid something change.

This error sounds like this: a role is a library function. This analogy has ruined so many good beginnings that it's just sad to watch. A role is not a library function. She can't do calculations and she can't make play-level decisions. Remind me what decisions play makes?

Thank you, you are right. Play makes a decision (more precisely, contains information) about which tasks and roles to perform on which hosts.

If you delegate this decision to a role, and even with calculations, you doom yourself (and whoever your code will try to parse) to a miserable existence. The role does not decide where to run it. This decision is made by play. The role does what it was told, where it was told.

Why programming in Ansible is dangerous and how COBOL is better than Ansible, we will discuss in the chapter on variables and jinja. For now, let's say one thing - each of your calculations leaves an indelible trail of changing global variables, and you can't do anything about it. As soon as the two "tracks" crossed, everything was gone.

Note for the corrosive: the role, of course, can affect the control flow. Eat delegate_to and it has reasonable uses. Eat meta: end host/play. But! Remember we teach the basics? Forgot about delegate_to. We are talking about the simplest and most beautiful Ansible code. Which is easy to read, easy to write, easy to debug, easy to test, and easy to write. So, one more time:

play and only play decides on which hosts what is executed.

In this section, we dealt with the opposition of play and role. Now let's talk about the tasks vs role relationship.

Tasks and Roles

Consider play:

- hosts: somegroup
  pre_tasks:
    - some_tasks1:
  roles:
     - role1
     - role2
  post_tasks:
     - some_task2:
     - some_task3:

Let's say you need to do foo. And it looks like foo: name=foobar state=present. Where to write it? in pre? post? Create role?

… And where did the tasks go?

We start again with the basics - the play device. If you're floating around on this, you can't use play as the basis for everything else, and your result is "wobbly".

The play device: the hosts directive, settings for the play itself and the pre_tasks, tasks, roles, post_tasks section. The remaining parameters for play are not important to us now.

The order of their sections with tasks and roles: pre_tasks, roles, tasks, post_tasks. Since, semantically, the order of execution between tasks ΠΈ roles is not clear, then best practices says that we are adding a section tasks, only if not roles. If there is roles, then all the attached tasks are placed in the sections pre_tasks/post_tasks.

It remains only that everything is semantically clear: first pre_tasks, later roles, later post_tasks.

But we still have not answered the question: where is the module call foo write? Do we need to write an entire role for each module? Or is it better to have a thick role for everything? And if not a role, then where to write - in pre or in post?

If there is no reasoned answer to these questions, then this is a sign of a lack of intuition, that is, those very "shaky foundations". Let's figure it out. Security question first: If play has pre_tasks ΠΈ post_tasks (and there are neither tasks nor roles), then can something break if I first task from post_tasks move to the end pre_tasks?

Of course, the wording of the question hints that it will break. But what exactly?

… Handlers. Reading the basics reveals an important fact: all handlers are automatically flushed after each section. Those. all tasks from pre_tasks, then all the handlers that were notify. Then all roles and all handlers that were notify in roles are executed. After post_tasks and their handlers.

Thus, if you drag a task from post_tasks Π² pre_tasks, then you potentially execute it before the handler executes. for example, if in pre_tasks install and configure the web server, and post_tasks something is sent to it, then the transfer of this task to the section pre_tasks will lead to the fact that at the moment of "sending" the server will not be started yet and everything will break.

Now let's think again, why do we pre_tasks ΠΈ post_tasks? For example, in order to perform everything necessary (including handlers) before the role is executed. A post_tasks will allow us to work with the results of executing roles (including handlers).

Ansible connoisseur will tell us what is meta: flush_handlers, but why do we need flush_handlers when we can rely on the execution order of the sections in play? Moreover, the use of meta: flush_handlers can deliver unexpected things to us with repeated handlers, give us strange warnings in case of using when Ρƒ block etc. The better you know the ansible, the more nuances you can name for a "tricky" solution. And a simple solution - using a natural separation between pre / roles / post - does not cause nuances.

And, back to our 'foo'. Where to put it? In pre, post or roles? Obviously, this depends on whether we want the results of the handler for foo. If they are not there, then foo does not need to be put in either pre or post - these sections have a special meaning - performing tasks before and after the main code array.

Now the answer to the question "role or task" comes down to what is already in play - if there are tasks, then you need to add to tasks. If there are roles, you need to make a role (albeit from one task). I remind you that tasks and roles are not used at the same time.

Understanding the basics of Ansible provides reasonable answers to seemingly questions of taste.

Tasks and roles (part two)

Now let's discuss the situation when you are just starting to write a playbook. You need to make foo, bar and baz. Are these three tasks, one role or three roles? Summarizing the question: at what point should you start writing roles? What is the point of writing roles when you can write tasks? ... And what is a role?

One of the biggest mistakes (I already talked about this) is to think that a role is like a function in a program's library. What does a generic description of a function look like? It takes input arguments, interacts with side causes, does side effects, returns a value.

Now, attention. What can be done from this in the role? Call side effects - you are always welcome, this is the essence of the whole Ansible - to do side effects. Have side causes? Elementary. But with "pass a value and return it" - that's where it's not. First, you cannot pass a value to a role. You can set a global variable with a lifetime of size play in the vars section for the role. You can set a global variable with a lifetime in play within a role. Or even with a playbook lifetime (set_fact/register). But you can't have "local variables". You cannot "take a value" and "return it".

The main thing follows from this: you can’t write something on ansible and not cause side effects. Changing global variables is always a side effect for a function. In Rust, for example, changing a global variable is unsafe. And in Ansible - the only way to influence the values ​​for the role. Pay attention to the words used: not "pass a value to a role", but "change the values ​​that the role uses". There is no isolation between roles. There is no isolation between tasks and roles.

Total: role is not a function.

What's good about a role? First, the role has default values ​​(/default/main.yaml), secondly, the role has additional directories for folding files.

Why are default values ​​good? The fact that in Maslow's pyramid, Ansible's rather perverted variable priority table, role defaults are the most non-priority ones (minus the ansible command line parameters). This means that if you need to provide default values ​​and not worry about them overriding inventory or group variables, then role defaults are the only right place for you. (I'm lying a little - there's more |d(your_default_here), but if we talk about stationary places, then only defaults of roles).

What else is good about the roles? The fact that they have their own directories. These are directories for variables, both constant (i.e. calculated for the role) and dynamic (there is such a pattern, or an anti-pattern - include_vars with {{ ansible_distribution }}-{{ ansible_distribution_major_version }}.yml.). These are directories for files/, templates/. Also, it allows you to have roles of your own modules and plugins (library/). But, in comparison with the tasks of the playbook (which can also have all this), the only benefit here is that the files are not dumped into one heap, but several separate heaps.

One more detail: you can try to make roles that will be available for reuse (via galaxy). After the advent of collections, distribution of roles can be considered almost forgotten.

Thus, roles have two important features: they have defaults (a unique feature) and they allow you to structure your code.

Returning to the original question: when to do tasks and when to do roles? Tasks in the playbook are most often used either as "glue" before/after roles, or as an independent building element (then there should be no roles in the code). A pile of normal tasks mixed with roles is an unambiguous slovenliness. You should adhere to a specific style - either tasks or roles. Roles give separation of entities and defaults, tasks allow you to read the code faster. Usually, more "stationary" (important and complex) code is taken out in the role, and auxiliary scripts are written in the style of tasks.

It is possible to do import_role as a task, but if you write this, then be prepared for an explanation for your own sense of beauty, why you want to do this.

The astute reader might say that roles can import roles, roles can have a dependency via galaxy.yml, and then there's the scary and terrible include_role - I remind you that we improve skills in basic Ansible, and not in figure gymnastics.

Handlers and tasks

Let's discuss one more obvious thing: handlers. Knowing how to use them correctly is almost an art. What is the difference between a handler and a task?

Since we remember the basics, here is an example:

- hosts: group1
  tasks:
    - foo:
      notify: handler1
  handlers:
     - name: handler1
       bar:

The role's handlers are in rolename/handlers/main.yaml. Handlers are rummaged between all participants in the play: pre/post_tasks can pull role handlers, and a role can pull handlers from the play. However, "cross-role" calls to handlers cause much more wtf than repeating a trivial handler. (Another element of best practices is to avoid repeating handler names).

The main difference is that the task is executed (idempotently) always (plus/minus tags and when), and the handler - by state change (notify works only if it was changed). What is the risk? For example, the fact that when you restart, if it was not changed, then there will be no handler. And why can it be that we need to execute the handler when the parent task has not changed? For example, because something broke and was changed, but the execution did not reach the handler. For example, because the network was temporarily down. The config has changed, the service has not been restarted. At the next start, the config does not change anymore, and the service remains with the old version of the config.

The situation with the config is not solvable (more precisely, you can invent a special restart protocol for yourself with file flags, etc., but this is no longer 'basic ansible' in any form). But there is another common story: we installed the application, recorded it .service-file, and now we want it daemon_reload ΠΈ state=started. And the natural place for that seems to be the handler. But if you make it not a handler but a task at the end of the tasklist or role, then it will be idempotently executed every time. Even if the playbook broke in the middle. This does not solve the restarted problem at all (you can’t do a task with the restarted attribute, because idempotency is lost), but it’s definitely worth doing state=started, the overall stability of the playbook increases, because. the number of connections and dynamic state decreases.

Another positive property of the handler is that it does not pollute the output. There were no changes - no extra skipped or ok in the output - easier to read. It is also a negative property - if you find a typo in a linearly executed task on the first run, then handlers will be executed only when changed, i.e. under certain conditions - very rarely. For example, for the first time in my life five years later. And, of course, there will be a typo in the name and everything will break. And the second time they do not run - changed is not.

Separately, it is necessary to talk about the availability of variables. For example, if you notify for a task with a loop, what will be in the variables? You can guess analytically, but it's not always trivial, especially if the variables come from different places.

… So handlers are a lot less useful and a lot more problematic than they seem. If you can write something beautifully (without frills) without handlers, it's better to do it without them. If it doesn’t work out beautifully, it’s better with them.

The astute reader rightly notes that we have not discussed listenthat a handler can call notify on another handler, that a handler can include import_tasks (which can do include_role c with_items), that Ansible's handler system is Turing-complete, that include_role handlers intersect with play handlers in curious ways, etc. .d. - all this is clearly not "basics").

Although there is one particular WTF that is actually a feature that needs to be kept in mind. If your task is executed with delegate_to and it has notify, then the corresponding handler is executed without delegate_to, i.e. on the host where play is assigned. (Although the handler, of course, may have delegate_to too).

Separately, I want to say a few words about reusable roles. Before collections, there was an idea that you could make generic roles that you could ansible-galaxy install and went. Works on all OS of all variants in all situations. So here's my take: it doesn't work. Any role with a massive include_vars, supporting 100500 cases is doomed to abysses of corner case bugs. They can be plugged with massive testing, but as with any testing, either you have a Cartesian product of input values ​​​​and a total function, or you have β€œseparate scenarios covered”. My opinion is that it is much better if the role is linear (cyclomatic complexity 1).

The fewer if'ov (explicit or declarative - in the form when or form include_vars on a set of variables), the better the role. Sometimes you have to make branches, but, I repeat, the fewer, the better. So it seems to be a good role with the galaxy (it works!) with a bunch of when may be less preferable than "own" role of five tasks. The moment when the role with the galaxy is better is when you start writing something. The moment when it gets worse is when something breaks, and you have a suspicion that it is because of the "role with the galaxy". You open it, and there are five inclusions, eight task lists and a stack of when'ov ... And this needs to be sorted out. Instead of 5 tasks, a linear list, in which there is nothing to break.

In the following parts

  • A little about inventory, group variables, host_group_vars plugin, hostvars. How to tie a Gordian knot out of spaghetti. Scope and precedence variables, Ansible memory model. "So where do you store the username for the database?".
  • jinja: {{ jinja }} - nosql notype nosense soft plasticine. It is everywhere, even where you don't expect it. A little about !!unsafe and tasty yaml.

Source: habr.com

Add a comment