Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

This is a transcript of the speech DevopsConf 2019-10-01 ΠΈ SPbLUG 2019-09-25.

This is the story of a project that used a self-written configuration management system and why the move to Ansible was delayed for 18 months.

Day #-XXX: Before the beginning

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

Initially, the infrastructure was a set of stand-alone hosts running Hyper-V. Creating a virtual machine required a lot of actions: putting disks in the right place, registering DNS, reserving DHCP, putting the VM configuration in a git repository. This process was partially mechanized, but for example VMs were distributed between hosts by hand. But, for example, developers could correct the VM configuration in git and apply it by rebooting the VM.

Custom Configuration Management Solution

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

The original idea, I suspect, was conceived as IaC: a lot of stateless VMs that reset their state to zero upon reboot. What is VM Configuration Management? Schematically it looks simple:

  1. For VM, a static MAC was nailed.
  2. An ISO with CoreOS and a boot disk were connected to the VM.
  3. CoreOS runs the customization script by downloading it from the WEB server based on its IP.
  4. The script downloads the VM configuration via SCP based on the IP address.
  5. The footcloth of systemd unit files and the footcloth of bash scripts are launched.

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

This solution had many obvious problems:

  1. The ISO in CoreOS has been deprecated.
  2. A lot of complex automated actions and magic when migrating / creating a VM.
  3. Difficulty with updating and when software of some version is needed. Even more fun with kernel modules.
  4. VMs were not so obtained without data, i.e. VMs have appeared in which a disk with user data is additionally mounted.
  5. Constantly someone messed with the dependencies of the systemd unit and when rebooting CoreOS hung. It was problematic to catch this with the available tools in CoreOS.
  6. Secret management.
  7. CM was not considered. There was bash and YML CoreOS configs.

To apply the VM configuration, you need to reboot it, but it might not reboot. It seems like an obvious problem, but there are no persistent disks - there is nowhere to save the logs. Well, ok, let's try to add options for loading the kernel so that the logs are sent. But no, how difficult it is.

Day #0: Recognizing the Problem

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

It was the usual development infrastructure: jenkins, test environments, monitoring, registry. CoreOS was conceived for hosting k8s clusters, i.e. the problem was how CoreOS was being used. The first step was choosing a stack. We settled on:

  1. CentOS as a base distribution, because it is the closest distribution to production environments.
  2. Ansible for configuration management, as it had extensive expertise.
  3. Jenkins as a framework for automating existing processes, because it has already been actively used for development processes
  4. Hyper-V as a virtualization platform. There are a number of reasons that go beyond the scope of the story, but in short - we cannot use the clouds, we must use our hardware.

Day #30: Fixing existing agreements – Agreements as Code

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

When the stack was clear, preparations for the move began. Fixing existing arrangements in the form of a code (Agreements as code!). Transition manual labor -> mechanization -> automation.

1.Configure VMs

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

Ansible does a great job of this. With a minimum of gestures, you can take control of the VM configurations:

  1. We create a git repository.
  2. We put the list of VMs in inventory, configurations in playbooks and roles.
  3. We set up a special jenkins slave from which it will be possible to run ansible.
  4. Create a job, set up Jenkins.

The first process is ready. Agreements are fixed.

2. Create new VM

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

Everything here was not very convenient. From Linux it is not very convenient to create VMs on Hyper-V. One of the attempts to mechanize this process was:

  1. Ansbile connects via WinRM to a windows host.
  2. Ansible runs the powershell script.
  3. Powershell script creates a new VM.
  4. When creating a VM in the guest OS, the hostname is configured using Hyper-V/ScVMM tools.
  5. VM sends its hostname when updating DHCP lease.
  6. Regular ddns & dhcp integration on the Domain Controller side sets up a DNS record.
  7. You can add a VM to the inventory and configure it with Ansible.

3.Create VM template

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

Here they did not invent anything - they took the packer.

  1. Add the packer, kickstart config to the git repository.
  2. We set up a special jenkins slave with hyper-v and Packer.
  3. Create a job, set up Jenkins.

How this link works:

  1. Packer creates an empty VM, attaches the ISO.
  2. The VM boots up, Packer injects a command into the bootloader to use our kickstart file from floppy or http.
  3. Anaconda is launched with our config, the initial OS setup is done.
  4. Packer waits for the VM to become available.
  5. Packer inside the VM runs ansible in local mode.
  6. Ansible using exactly the same roles that it works out in step # 1.
  7. Packer exports the VM template.

Day #75: Refactor the agreement without breaking = Test ansible + Testkitchen

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

Fixing conventions in code may not be enough. After all, if in the ins and outs of the process you want to change something, you can break something. Therefore, in the case of infrastructure, testing of this very infrastructure appears. In order to synchronize knowledge within the team, they began to test Ansible roles. I will not deepen there is an article describing the events at that moment in time Test me if you can or do YML programmers dream of ansible testing?(spoiler this was not the final version and later everything became more complicated How to start testing Ansible, refactor a project in a year and not go crazy).

Day #130: Or maybe CentOS + ansible is not needed? maybe openshift?

It must be understood that the process of introducing infrastructure was not the only one and there were side subprojects. For example, a request came in to launch our application in openshift and this resulted in research for more than one week We run the application in Openshift and compare the existing toolkit which slowed down the process. As a result, it turned out that openshift does not cover all needs, you need real hardware, or at least the ability to play with the kernel.

Day #170: Openshift doesn't work, take a chance with Windows Azure Pack?

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

Hyper-V is not very friendly, SCVMM doesn't make it much better. But there is such a thing as Windows Azure Pack, which is an add-on for SCVMM and mimics Azure. But in reality, the product looks abandoned: the documentation is broken links and very scarce. But as part of the study of options to simplify the life of our clouds, they looked at it too.

Day #250: Windows Azure Pack not so good. Staying on SCVMM

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

Windows Azure Pack looked promising, but it was decided not to bring WAP with its complexities into the system for the sake of unnecessary features and remained on SCVMM.

Day #360: Eating an elephant piece by piece

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

Only a year later, the platform where to move was ready and the process of moving began. For this, a SMART task was set. We wrote out all the VMs and began to deal with the configuration one by one, describe it in Ansible, and cover it with tests.

Day #450: What system did you get?

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

The process itself is not interesting. It is routine, it can be noted that most of the configurations were relatively simple or isomorphic and according to the Pareto principle, 80% of the configurations vm took 20% of the time. By the same principle, 80% of the time was spent on preparing the move and only 20% on the move itself.

Day #540: Final

Ansible: Migrating 120 VM configuration from CoreOS to CentOS in 18 months

What happened in 18 months?

  1. The agreements became a code.
  2. Manual labor -> Mechanization -> Automation.

Source: habr.com

Add a comment