Postgres: bloat, pg_repack and deferred constraints

Postgres: bloat, pg_repack and deferred constraints

The effect of bloating tables and indexes (bloat) is widely known and is present not only in Postgres. There are ways to deal with it “out of the box” like VACUUM FULL or CLUSTER, but they lock tables during operation and therefore cannot always be used.

The article will have some theory about how bloat occurs, how you can deal with it, about deferred constraints and the problems that they bring to using the pg_repack extension.

This article is based on my speech at PgConf.Russia 2020.

Why there is a bloat

Postgres is based on a multi-version model (MVCC). Its essence is that each row in the table can have several versions, while transactions see no more than one of these versions, but not necessarily the same one. This allows several transactions to work simultaneously and have little to no effect on each other.

Obviously, all these versions need to be kept. Postgres works with memory page by page, and a page is the minimum amount of data that can be read from disk or written. Let's look at a small example to understand how this happens.

Let's say we have a table to which we have added several records. The first page of the file where the table is stored has new data. These are live versions of rows that are available to other transactions after the commit (for simplicity, we will assume that the isolation level is Read Committed).

Postgres: bloat, pg_repack and deferred constraints

We then updated one of the entries and thus marked the old version as out of date.

Postgres: bloat, pg_repack and deferred constraints

Step by step, updating and deleting row versions, we got a page in which about half of the data is “garbage”. This data is not visible to any transaction.

Postgres: bloat, pg_repack and deferred constraints

Postgres has a mechanism VACUUM, which cleans up outdated versions and makes room for new data. But if it is not configured aggressively enough or is busy working in other tables, then the “garbage data” remains, and we have to use additional pages for new data.

So in our example, at some point in time, the table will consist of four pages, but there will be only half of the live data in it. As a result, when accessing the table, we will read out much more data than necessary.

Postgres: bloat, pg_repack and deferred constraints

Even if VACUUM now removes all irrelevant row versions, the situation will not improve dramatically. We will have free space in pages or even whole pages for new lines, but we will still read more data than we need.
By the way, if a completely blank page (the second in our example) were at the end of the file, then VACUUM could cut it off. But now she is in the middle, so nothing can be done with her.

Postgres: bloat, pg_repack and deferred constraints

When the number of such empty or heavily sparse pages becomes large, which is called bloat, it starts to affect performance.

Everything described above is the mechanics of the occurrence of bloat in tables. In indexes, this happens in much the same way.

Do I have a bloat?

There are several ways to determine if you have a bloat. The idea of ​​the first is to use Postgres internal statistics, which contains approximate information about the number of rows in tables, the number of “live” rows, etc. There are many variations of ready-made scripts on the Internet. We took as a basis script from PostgreSQL Experts, which can evaluate table bloat along with toast and bloat btree indexes. In our experience, its error is 10-20%.

Another way is to use the extension pgstattuple, which allows you to look inside the pages and get both the estimated and the exact value of bloat. But in the second case, you will have to scan the entire table.

A small amount of bloat, up to 20%, is acceptable. It can be considered as an analogue of the fillfactor for tables и indices. At 50% and above, performance problems may begin.

Ways to deal with bloat

There are several out-of-the-box ways to deal with bloat in Postgres, but they are far from always and may not suit everyone.

Set up AUTOVACUUM so that bloat does not occur. And to be more precise, to keep it at an acceptable level for you. This seems like a "captain's" advice, but in reality this is not always easy to achieve. For example, you have an active development with a regular change in the data schema, or some kind of data migration is taking place. As a result, your load profile can change frequently and tend to be different for different tables. This means that you need to constantly be a little ahead of the curve and adjust AUTOVACUUM to the changing profile of each table. But it is obvious that this is not easy to do.

Another common reason that AUTOVACUUM fails to process tables is the presence of long-running transactions that prevent it from cleaning up data due to the fact that it is available to these transactions. The recommendation here is also obvious - get rid of "hanging" transactions and minimize the time of active transactions. But if the load on your application is a hybrid of OLAP and OLTP, then you can simultaneously have many frequent updates and short queries, as well as lengthy operations, such as building a report. In such a situation, you should think about spreading the load on different bases, which will allow you to fine-tune each of them.

Another example - even if the profile is homogeneous, but the database is under a very high load, then even the most aggressive AUTOVACUUM may not be able to cope, and bloat will occur. Scaling (vertical or horizontal) is the only solution.

How to be in a situation when you configured AUTOVACUUM, but bloat continues to grow.

Team VACUUM FULL rebuilds the contents of tables and indexes and leaves only up-to-date data in them. To eliminate bloat, it works perfectly, but during its execution, an exclusive lock on the table (AccessExclusiveLock) is captured, which will not allow queries to this table, even selects. If you can afford to stop your service or part of it for some time (from tens of minutes to several hours depending on the size of the database and your hardware), then this option is the best. We, unfortunately, do not have time to run VACUUM FULL during the scheduled maintenance, so this method does not suit us.

Team CLUSTER rebuilds the contents of tables in the same way as VACUUM FULL, while allowing you to specify an index according to which the data will be physically ordered on disk (but the order is not guaranteed for new rows in the future). In certain situations, this is a good optimization for a number of queries - with reading several records in the index. The disadvantage of the command is the same as that of VACUUM FULL - it locks the table during operation.

Team REINDEX similar to the previous two, but rebuilds a specific index or all indexes on a table. Locks are slightly weaker: ShareLock on a table (prevents modifications, but allows selects) and AccessExclusiveLock on a rebuildable index (blocks queries using this index). However, Postgres 12 introduced the option CONCURRENTLY, which allows you to rebuild an index without blocking concurrent addition, modification, or deletion of records.

In earlier versions of Postgres, you can achieve a result similar to REINDEX CONCURRENTLY with CREATE INDEX CONCURRENTLY. It allows you to create an index without a strong lock (ShareUpdateExclusiveLock, which does not interfere with parallel queries), then replace the old index with a new one and delete the old index. This allows you to eliminate index bloat without interfering with your application. It is important to consider that when rebuilding indexes, there will be an additional load on the disk subsystem.

Thus, if there are ways for indexes to eliminate hot bloat, then there are none for tables. This is where external extensions come into play: pg_repack (formerly pg_reorg), pgcompact, pgcompacttable and others. Within the framework of this article, I will not compare them and will only talk about pg_repack, which, after some refinement, we use at home.

How pg_repack works

Postgres: bloat, pg_repack and deferred constraints
Let's say we have a quite ordinary table - with indexes, restrictions and, unfortunately, with bloat. As a first step, pg_repack creates a log table to keep track of all changes while it is running. The trigger will replicate these changes to every insert, update, and delete. Then a table is created that is similar to the original in structure, but without indexes and restrictions, so as not to slow down the process of inserting data.

Next, pg_repack transfers data from the old table to the new table, automatically filtering out all irrelevant rows, and then creates indexes for the new table. During the execution of all these operations, changes are accumulated in the log table.

The next step is to transfer the changes to the new table. The migration is done in several iterations, and when there are less than 20 entries left in the log table, pg_repack acquires a strong lock, migrates the latest data, and replaces the old table with the new one in the Postgres system tables. This is the only and very short moment of time when you will not be able to work with the table. After that, the old table and the table with logs are deleted and space is freed up in the file system. Process completed.

In theory, everything looks great, but what about in practice? We tested pg_repack without load and under load, and checked its operation in case of a premature stop (in other words, by Ctrl+C). All tests were positive.

We went to the prod - and then everything went wrong, as we expected.

First pancake on sale

On the first cluster, we got an error about a unique constraint violation:

$ ./pg_repack -t tablename -o id
INFO: repacking table "tablename"
ERROR: query failed: 
    ERROR: duplicate key value violates unique constraint "index_16508"
DETAIL:  Key (id, index)=(100500, 42) already exists.

This constraint had the auto-generated name index_16508, created by pg_repack. By the attributes included in its composition, we determined “our” constraint that corresponds to it. The problem turned out to be that this is not quite an ordinary constraint, but a delayed one (deferred constraint), i.e. its validation is performed later than the sql command, which leads to unexpected consequences.

Deferred constraints: why they are needed and how they work

A bit of theory about deferred constraints.
Consider a simple example: we have a car directory table with two attributes - the name and order of the car in the directory.
Postgres: bloat, pg_repack and deferred constraints

create table cars
(
  name text constraint pk_cars primary key,
  ord integer not null constraint uk_cars unique
);



Suppose we needed to swap the first and second cars in places. The head-on solution is to update the first value to the second, and the second to the first:

begin;
  update cars set ord = 2 where name = 'audi';
  update cars set ord = 1 where name = 'bmw';
commit;

But when we run this code, we're expected to get a constraint violation because the order of the values ​​in the table is unique:

[23305] ERROR: duplicate key value violates unique constraint “uk_cars”
Detail: Key (ord)=(2) already exists.

How to do it differently? Option one: add an additional replacement of the value with an order that is guaranteed not to exist in the table, for example “-1”. In programming, this is called "exchanging the values ​​of two variables through a third." The only downside to this method is the extra update.

Option two: Redesign the table to use a floating point data type for the exponent value instead of integers. Then, when updating the value from 1, for example, to 2.5, the first entry will automatically “stand” between the second and third. This solution works, but there are two limitations. First, it won't work for you if the value is used somewhere in the interface. Secondly, depending on the precision of the data type, you will have a limited number of possible inserts before recalculating the values ​​of all records.

Option three: make the constraint deferred so that it is checked only at the time of the commit:

create table cars
(
  name text constraint pk_cars primary key,
  ord integer not null constraint uk_cars unique deferrable initially deferred
);

Since the logic of our initial request ensures that all values ​​are unique by the time the commit is made, the commit will succeed.

The example discussed above is, of course, very synthetic, but it reveals the idea. In our application, we use deferred constraints to implement the logic that is responsible for resolving conflicts when users interact with shared widget objects on the board at the same time. The use of such restrictions allows us to make the application code a little simpler.

In general, depending on the constraint type in Postgres, there are three levels of granularity of their validation: row, transaction, and expression.
Postgres: bloat, pg_repack and deferred constraints
Source: begriffs

CHECK and NOT NULL are always checked at the row level, for other restrictions, as can be seen from the table, there are different options. You can read more here.

To summarize briefly, deferred constraints in a number of situations lead to more readable code and fewer commands. However, you have to pay for this by complicating the debug process, since the moment the error occurs and the moment you learn about it are separated in time. Another possible problem is that the scheduler may not always be able to build an optimal plan if a delayed constraint is involved in the query.

Improvement of pg_repack

We've covered what deferred constraints are, but how do they relate to our problem? Recall the error we got earlier:

$ ./pg_repack -t tablename -o id
INFO: repacking table "tablename"
ERROR: query failed: 
    ERROR: duplicate key value violates unique constraint "index_16508"
DETAIL:  Key (id, index)=(100500, 42) already exists.

It occurs when data is copied from the log table to a new table. This looks strange, because the data in the log table is committed along with the data in the original table. If they satisfy the original table's constraints, how can they violate the same constraints in the new one?

As it turned out, the root of the problem lies in the previous step of pg_repack, which creates only indexes, but not constraints: the old table had a unique constraint, and the new one created a unique index instead.

Postgres: bloat, pg_repack and deferred constraints

It is important to note here that if the constraint is normal and not deferred, then the unique index created instead of it is equivalent to this constraint, because unique constraints in Postgres are implemented by creating a unique index. But in the case of a delayed constraint, the behavior is not the same, because the index cannot be deferred and is always checked at the time the sql command is executed.

Thus, the essence of the problem lies in the “deferred” check: in the original table, it occurs at the time of the commit, and in the new one, at the time of the execution of the sql command. So we need to make sure that the checks are performed in the same way in both cases: either always delayed, or always immediately.

So what ideas did we have.

Create index similar to deferred

The first idea is to perform both checks in immediate mode. This may give rise to several false positives of the restriction, but if there are few of them, then this should not affect the work of users, since such conflicts are a normal situation for them. They occur, for example, when two users start editing the same widget at the same time, and the client of the second user does not have time to receive information that the widget is already blocked for editing by the first user. In such a situation, the server responds to the second user with a refusal, and its client rolls back the changes and locks the widget. A little later, when the first user completes editing, the second will receive information that the widget is no longer blocked, and will be able to repeat his action.

Postgres: bloat, pg_repack and deferred constraints

To ensure that checks are always in undeferred mode, we created a new index similar to the original deferred constraint:

CREATE UNIQUE INDEX CONCURRENTLY uk_tablename__immediate ON tablename (id, index);
-- run pg_repack
DROP INDEX CONCURRENTLY uk_tablename__immediate;

On the test environment, we received only a few expected errors. Success! We ran pg_repack again on prod and got 5 errors on the first cluster in an hour of work. This is an acceptable result. However, already on the second cluster, the number of errors increased significantly and we had to stop pg_repack.

Why did it happen? The probability of an error occurring depends on how many users work simultaneously with the same widgets. Apparently, at that moment, there were much fewer competitive changes with the data stored on the first cluster than on the rest, i.e. we're just "lucky".

The idea didn't work. At that moment, we saw two other solutions: rewrite our application code to abandon deferred restrictions, or “teach” pg_repack to work with them. We chose the second one.

Replace indexes in new table with deferred constraints from original table

The purpose of the revision was obvious - if the original table has a deferred constraint, then for the new one it is necessary to create such a constraint, and not an index.

To test our changes, we wrote a simple test:

  • table with deferred constraint and one record;
  • we insert data in the loop that conflicts with the existing record;
  • do an update - the data no longer conflicts;
  • commit changes.

create table test_table
(
  id serial,
  val int,
  constraint uk_test_table__val unique (val) deferrable initially deferred 
);

INSERT INTO test_table (val) VALUES (0);
FOR i IN 1..10000 LOOP
  BEGIN
    INSERT INTO test_table VALUES (0) RETURNING id INTO v_id;
    UPDATE test_table set val = i where id = v_id;
    COMMIT;
  END;
END LOOP;

The original version of pg_repack always crashed on the first insert, the modified version worked without errors. Great.

We go to prod and again we get an error at the same phase of copying data from the log table to a new one:

$ ./pg_repack -t tablename -o id
INFO: repacking table "tablename"
ERROR: query failed: 
    ERROR: duplicate key value violates unique constraint "index_16508"
DETAIL:  Key (id, index)=(100500, 42) already exists.

A classic situation: everything works on test environments, but not on production?!

APPLY_COUNT and the junction of two batches

We started to analyze the code literally line by line and discovered an important point: data is transferred from the log table to the new one in batches, the APPLY_COUNT constant indicated the size of the batch:

for (;;)
{
num = apply_log(connection, table, APPLY_COUNT);

if (num > MIN_TUPLES_BEFORE_SWITCH)
     continue;  /* there might be still some tuples, repeat. */
...
}

The problem is that the data of the original transaction, in which several operations can potentially violate the constraint, can end up at the junction of two batches during the transfer - half of the commands will be committed in the first batch, and the other half in the second. And here, how lucky: if the teams in the first batch do not violate anything, then everything is fine, but if they do, an error occurs.

APPLY_COUNT is equal to 1000 records, which explains why our tests were successful - they did not cover the “batch junction” case. We used two commands - insert and update, so exactly 500 transactions of two commands were always placed in a batch and we did not experience problems. After adding the second update, our edit stopped working:

FOR i IN 1..10000 LOOP
  BEGIN
    INSERT INTO test_table VALUES (1) RETURNING id INTO v_id;
    UPDATE test_table set val = i where id = v_id;
    UPDATE test_table set val = i where id = v_id; -- one more update
    COMMIT;
  END;
END LOOP;

So, the next task is to make sure that the data from the original table that was changed in one transaction gets into the new table also within one transaction.

Rejection of batching

And we again had two solutions. First: let's refuse to split into batches altogether and make the data transfer a single transaction. In favor of this decision was its simplicity - the required code changes are minimal (by the way, in older versions back then, pg_reorg worked that way). But there is a problem - we are creating a long-term transaction, and this, as mentioned earlier, is a threat to the emergence of a new bloat.

The second solution is more complicated, but probably more correct: create a column in the log table with the identifier of the transaction that added data to the table. Then when copying data, we can group it by this attribute and ensure that related changes are transferred together. The batch will be formed from several transactions (or one large one) and its size will vary depending on how much data has been changed in these transactions. It is important to note that since the data of different transactions enter the log table in random order, it will no longer be possible to read it sequentially, as it was before. seqscan on each request filtered by tx_id is too expensive, you need an index, but it will also slow down the method due to the overhead of updating it. In general, as always, you need to sacrifice something.

So, we decided to start with the first option, as a simpler one. First, it was necessary to understand whether a long transaction would be a real problem. Since the main transfer of data from the old table to the new one also occurs in one long transaction, the question has been transformed into “how much will we increase this transaction?” The duration of the first transaction depends mainly on the size of the table. The duration of the new one depends on how many changes will accumulate in the table during the data transfer, i.e. on the intensity of the load. The pg_repack run occurred at a time of minimal service load, and the amount of change was incomparably small compared to the original table size. We decided that we can neglect the time of a new transaction (for comparison, the average is 1 hour and 2-3 minutes).

The experiments were positive. Launch on sale too. For clarity, here is a picture with the size of one of the bases after the run:

Postgres: bloat, pg_repack and deferred constraints

Since this solution completely suited us, we did not try to implement the second one, but we are considering the possibility of discussing it with the developers of the extension. Our current revision, unfortunately, is not yet ready for publication, since we only solved the problem with unique delayed constraints, and for a full-fledged patch, support for other types needs to be made. We hope to be able to do this in the future.

Perhaps you have a question, why did we even get involved in this story with the refinement of pg_repack, and did not, for example, use its analogues? At some point, we also thought about it, but the positive experience of using it earlier, on tables without delayed constraints, motivated us to try to understand the essence of the problem and fix it. In addition, using other solutions also requires time to conduct tests, so we decided that we would first try to fix the problem in it, and if we realized that we could not do it in a reasonable time, then we would start considering analogues.

Conclusions

What we can recommend based on our own experience:

  1. Monitor your bloat. Based on the monitoring data, you will be able to understand how well autovacuum is tuned.
  2. Set AUTOVACUUM to keep bloat at an acceptable level.
  3. If the bloat is still growing and you can't deal with it with out-of-the-box tools, don't be afraid to use external extensions. The main thing is to test everything well.
  4. Don't be afraid to modify external solutions to suit your needs - sometimes it can be more efficient and even easier than changing your own code.

Source: habr.com

Add a comment