How not to shoot yourself in the foot using Liquibase

It never happened, and here it is again!

On the next project, we decided to use Liquibase from the very beginning to avoid problems in the future. As it turned out, not all young team members know how to use it correctly. I did an internal workshop, which I then decided to turn into an article.

This article includes helpful tips and descriptions of three of the most obvious pitfalls you can fall into when working with relational database migration tools, Liquibase in particular. Designed for Java developers of Junior and Middle level, for more experienced developers it may be interesting for structuring and repeating what is most likely already known.

How not to shoot yourself in the foot using Liquibase

Liquibase and Flyway are the main competing technologies for solving the problems of version control of relational structures in the Java world. The first is completely free, in practice it is more often chosen for use, which is why Liquibase was chosen as the hero of the publication. However, some of the practices described may be generic, depending on the architecture of your application.

Relational migrations are a forced way to deal with the weak flexibility of relational data stores. In the era of fashion for OOP, the style of working with the database meant that we would describe the schema once and not touch it again. But the reality is always that things change, and changes to the structure of tables are required quite often. Naturally, the process itself is painful and unpleasant.

I will not delve into the description of the technology and instructions for adding the library to your project, enough articles have been written on this topic:

In addition, there was already a great article on the topic of useful tips:

Tips

I want to share my advice and comments, which were born through the sweat, blood and pain of solving problems with migration.

1. Before starting, you should read the best practices section on Online Liquibase

There simple but very important things are described, without which the use of the library can complicate your life. For example, a non-structural approach to changeset management will sooner or later lead to confusion and broken migrations. If you do not roll out mutually dependent changes in the structure of the database and the logic of services at the same time, then there is a high probability that this will lead to red tests or a broken environment. In addition, the recommendations for using Liquibase on the official website contain a paragraph about the development and verification of rollback scripts along with the main migration scripts. Well, in the article https://habr.com/ru/post/178665/ there are examples of code related to migrations and the rollback mechanism.

2. If you started using migration tools, do not allow manual corrections in the database structure

As the saying goes: "Once Persil, always Persil." If the base of your application has started to be managed by Liquibase tools, any manual changes instantly lead to an inconsistent state, and the level of trust in changesets becomes zero. Potential risks - several hours spent on restoring the database, in the worst case scenario - a dead server. If your team has an "old school" DBA Architect, patiently and thoughtfully explain to him how bad things will be if he just edits the database in his own way from the conditional SQL Developer.

3. If the changeset has already been pushed to the repository, avoid editing

If another developer pulled and applied a changeset that will be edited later, he will definitely remember you with a kind word when he receives an error when the application starts. If editing the changeset somehow leaks into development, you will have to go down the slippery slope of hotfixes. The essence of the problem rests on the validation of changes by hash sum - the main mechanism of Liquibase. When editing the changeset code, the hash sum changes. Editing changesets is possible only when it is possible to deploy the entire database from scratch without losing data. In this case, refactoring SQL or XML code can, on the contrary, make life easier, make migrations more readable. An example would be a situation when, at the start of the application, the schema of the source database was coordinated within the team.

4. Have verified database backups if possible

Here, I think, everything is clear. If suddenly the migration was unsuccessful, everything can be returned back. Liquibase has a rollback tool, but the rollback scripts are also written by the developer himself, and they can have problems with the same probability as in the main changeset scripts. This means that playing it safe with backups is useful in any case.

5. Use verified database backups in development if possible

If this does not contradict contracts and privacy, there is no personal data in the database, and it does not weigh like two suns - before applying it on live migration servers, you can check how it works on the developer's machine and calculate almost 100% of potential problems during migration.

6. Communicate with other developers in the team

In a well-organized development process, everyone on the team knows who is doing what. In reality, this is often not the case, therefore, if you are preparing changes in the database structure as part of your task, it is advisable to additionally notify the entire team about this. If someone is making changes in parallel, you should organize carefully. It is worth communicating with colleagues even at the end of work, not only at the start. Many potential problems with changesets can be resolved at the code review stage.

7. Think what you are doing!

Seemingly self-evident advice applicable to any situation. However, many problems could have been avoided if the developer had once again analyzed what he was doing and what it might affect. Working with migrations always requires extra attention and accuracy.

Traps

Let's now look at the typical traps that you can fall into if you do not follow the advice above, and what, in fact, should be done?

Situation 1. Two developers are trying to add new changesets at the same time

How not to shoot yourself in the foot using Liquibase
Vasya and Petya want to create a version 4 changeset without knowing about each other. They made changes to the database structure, and rolled out a pull request, with different changeset files. The following mechanism is proposed below:

How to solve

  1. Somehow, colleagues must agree on the order in which their changesets should go, let's say Petin should be applied first.
  2. One person should pour the other one in and mark Vasya's changeset with version 5. This can be done via Cherry Pick or a neat merge.
  3. After the changes, be sure to check the validity of the actions taken.
    In fact, the Liquibase mechanisms will allow you to have two version 4 changesets in the repository, so you can leave everything as it is. That is, you will simply have two revisions of version 4 with different names. With this approach, database versions become very difficult to navigate later.

In addition, Liquibase, like the homes of hobbits, keeps a lot of secrets. One of them is the validCheckSum key, which has appeared since version 1.7 and allows you to specify a valid hash value for a specific changeset, regardless of what is stored in the database. Documentation https://www.liquibase.org/documentation/changeset.html says the following:

Add a checksum that is considered valid for this changeSet, regardless of what is stored in the database. Used primarily when you need to change a changeSet and don't want errors thrown on databases on which it has already run (not a recommended procedure)

Yes, this is not recommended. But sometimes a strong light magician also masters dark techniques.

Case 2: Data-driven migration

How not to shoot yourself in the foot using Liquibase

Let's say you can't use database backups from live servers. Petya created a changeset, tested it locally, and with full confidence that he was right, made a pull request to the developer. Just in case, the project leader clarified whether Petya checked it, and then poured it in. But the deployment on the development server has fallen.

In fact, this is possible, and no one is immune from this. This happens if the modifications of the table structure are somehow tied to specific data from the database. Obviously, if Petya's database is filled with only test data, then it may not cover all problem cases. For example, when deleting a table, it turns out that there are records in other tables by Foreign Key associated with records in the one being deleted. Or when changing the column type, it turns out that not 100% of the data can be converted to the new type.

How to solve

  • Write special scripts that will be applied once along with the migration and bring the data into the proper form. This is a general way to solve the problem of transferring data to new structures after applying migrations, but something similar can be applied before, in special cases. This path, of course, is not always available, because editing data on live servers can be dangerous and even fatal.
  • Another tricky way is to edit an existing changeset. The difficulty is that all the databases where it has already been applied in its existing form will have to be restored. It is quite possible that the entire backend team will be forced to locally roll up the database from scratch.
  • And the most universal way is to transfer the data problem to the developer's environment, recreate the same situation and add a new changeset, to a broken one, that will bypass the problem.
    How not to shoot yourself in the foot using Liquibase

In general, the more the database is similar in composition to the production server database, the less likely that problems with migrations will go far. And, of course, before you send the changeset to the repository, you should think several times if it will break something.

Situation 3. Liquibase starts to be used after it goes into production

Suppose the team leader asked Petya to include Liquibase in the project, but the project is already in production and there is an already existing database structure.

Accordingly, the problem is that on any new servers or developer machines, the table data must be recreated from scratch, and the already existing environment must remain in a consistent state, being ready to accept new changesets.

How to solve

There are also several ways:

  • The first and most obvious is to have a separate script that must be applied manually when initializing a new environment.
  • The second, less obvious, is to have a Liquibase migration that is in a different Liquibase Context and apply it. You can read more about Liquibase Context here: https://www.liquibase.org/documentation/contexts.html. In general, this is an interesting mechanism that can be successfully applied, for example, for testing.
  • The third path consists of several steps. First, a migration must be created for existing tables. Then it must be applied on some environment and thus its hash sum will be obtained. The next step is to initialize empty Liquibase tables on our non-empty server, and you can manually put a record of a “as if applied” changeset with the changes already in the database into the table with the history of applying changesets. Thus, on an already existing server, the history will start from version 2, and all new environments will behave identically.
    How not to shoot yourself in the foot using Liquibase

Scenario 4: Migrations get huge and can't keep up

At the beginning of service development, as a rule, Liquibase is used as an external dependency, and all migrations are processed when the application starts. However, over time, you may stumble upon the following cases:

  • Migrations become huge and take a long time to complete.
  • There is a need to migrate in distributed environments, for example, on several instances of database servers at the same time.
    In this case, applying migrations for too long will result in a timeout when the application starts. Also, applying migrations on a per application instance basis can result in different servers being in an out of sync state.

How to solve

In such cases, your project is already large, perhaps even an adult, and Liquibase begins to act as a separate external tool. The fact is that Liquibase, as a library, is assembled into a jar file, and can work as a dependency within the project, as well as standalone.

Offline, you can leave the application of migrations to your CI/CD environment or to the strong shoulders of your sysadmins/deployers. To do this, you need the Liquibase command line https://www.liquibase.org/documentation/command_line.html. In this mode, it becomes possible to launch the application after all the necessary migrations have been completed.

Hack and predictor Aviator

In fact, there are many more pitfalls when working with database migrations, and many of them require a creative approach. It is important to understand that if you use the tool correctly, then most of these traps can be avoided. Specifically, I had to face all the problems listed in different forms, and some of them were the result of my jambs. Basically, this happens, of course, due to inattention, but sometimes - due to the criminal inability to use the tool.

Source: habr.com

Add a comment