Index health in PostgreSQL through the eyes of a Java developer

Hey.

My name is Vanya and I am a Java developer. It so happened that I work a lot with PostgreSQL - I do database tuning, structure optimization, performance and play a little DBA on weekends.

Lately I've tidied up a few databases in our microservices and written a java library pg-index-health, which makes this job easier, saves my time and helps to avoid some common mistakes made by developers. It is this library that will be discussed today.

Index health in PostgreSQL through the eyes of a Java developer

Disclaimer

The main version of PostgreSQL I work with is 10. All SQL queries I use are also tested on the 11th version. The minimum supported version is 9.6.

prehistory

It all started almost a year ago with a strange situation for me: the competitive creation of an index out of the blue ended with an error. The index itself, as usual, remained in the database in an invalid state. Log analysis showed a shortage temp_file_limit. And away we go… Digging deeper, I discovered a whole bunch of problems in the database configuration and, rolling up my sleeves, with a gleam in my eyes, I began to fix them.

The first problem is the default configuration

Probably, the metaphor about Postgres, which can be run on a coffee maker, is already pretty tired of everyone, but ... the default configuration really raises a number of questions. At a minimum, attention should be paid to maintenance_work_mem, temp_file_limit, statement_timeout ΠΈ lock_timeout.

In our case maintenance_work_mem was 64 MB by default, and temp_file_limit something about 2 GB - we simply did not have enough memory to create an index on a large table.

Therefore, in pg-index-health I made a series key, in my opinion, the parameters that should be configured for each database.

Problem Two - Duplicate Indexes

Our databases live on SSD drives and we use HA-configuration with multiple data centers, master host and n-th number of replicas. Disk space is a very valuable resource for us; it is no less important than performance and CPU consumption. Therefore, on the one hand, we need indexes for fast reading, and on the other hand, we do not want to see extra indexes in the database, as they eat up space and slow down data updates.

And now, having restored everything invalid indexes and having seen enough reports of Oleg Bartunov, I decided to arrange a "great" purge. It turned out that developers do not like to read the database documentation. They don't like it very much. Because of this, two typical errors occur - a manually created index on the primary key and a similar "manual" index on a unique column. The fact is that they are not needed - Postgres will do everything itself. Such indexes can be safely deleted, and a diagnostic appeared for this duplicated_indexes.

Problem Three - Overlapping Indices

Most novice developers create indexes on a single column. Gradually, after tasting this matter as it should, people begin to optimize their queries and add more complex indexes that include several columns. This is how indexes on columns appear A, A + B, A + B + C and so on. The first two of these indices can be safely thrown out, since they are prefixes of the third. This also decently saves disk space and there is a diagnostic for this. intersected_indexes.

Problem Four - Foreign Keys Without Indexes

Postgres allows you to create foreign key constraints without specifying a backing index. In many situations, this is not a problem, and may not even manifest itself in any way ... For the time being ...

So it was with us: just at some point in time, the job, which was running on schedule and clearing the database of test orders, began to β€œadd” the master host to us. CPU and IO flew into the shelf, requests were slowed down and interrupted by timeout, the service was five hundred. Quick Analysis pg_stat_activity showed that requests like:

delete from <table> where id in (…)

At the same time, there was, of course, an index by id in the target table, and very few records were deleted according to the condition. It seemed that everything should work, but, alas, it did not work.

Wonderful came to the rescue explain analyze and said that in addition to deleting records in the target table, there is also a referential integrity check, and on one of the related tables this check falls into scan sequential due to the lack of a suitable index. This is how diagnostics was born. foreign_keys_without_index.

Problem XNUMX - null value in indexes

By default, Postgres includes null values ​​in btree indexes, but they are generally not needed there. Therefore, I diligently try to throw out these nulls (diagnostics indexes_with_null_values), creating partial indexes on nullable columns by type where <A> is not null. In this way, I was able to reduce the size of one of our indexes from 1877 MB to 16 KB. And in one of the services, the size of the database decreased by 16% in total (by 4.3 GB in absolute terms) due to the exclusion of null values ​​from the indexes. Enormous disk space savings with very simple modifications. πŸ™‚

Problem six - lack of primary keys

Due to the peculiarities of the mechanism MVCC in Postgres a situation like this is possible bloatwhen the size of your table grows rapidly due to a large number of dead records. I naively believed that this was not a threat to us, and that this would not happen to our database, because, wow, we are normal developers ... How stupid and naive I was ...

One fine day, one wonderful migration took and updated all the records in a large and actively used table. We got +100 GB to the table size out of the blue. It was damn embarrassing, but our misadventures did not end there. After the autovacuum on this table ended after 15 hours, it became clear that the physical place would not return. We could not stop the service and make VACUUM FULL, so we decided to use pg_repack. And then it turned out that pg_repack does not know how to process tables without a primary key or other uniqueness constraint, but our table did not have a primary key. This is how diagnostics was born. tables_without_primary_key.

In the library version 0.1.5 added the ability to collect data on bloat tables and indexes and respond to it in a timely manner.

Issues Seven and Eight - Lack of Indexes and Unused Indexes

The following two diagnostics are βˆ’ tables_with_missing_indexes ΠΈ unused_indexes - in their final form appeared relatively recently. The fact is that they could not just be taken and added.

As I already wrote, we use a multi-replica configuration, and the read load on different hosts is fundamentally different. As a result, it turns out that some tables and indexes on some hosts are practically not used, and for analysis it is necessary to collect statistics from all hosts in the cluster. Reset statistics is also needed on every host in the cluster, you cannot do it only on the master.

This approach allowed us to save several tens of gigabytes by removing indexes that were never used, as well as adding missing indexes to rarely used tables.

As a conclusion

Of course, for almost all diagnostics, you can configure exclusion list. In this way, you can quickly implement checks in your application, preventing new errors from appearing, and then gradually fix old ones.

Some diagnostics can be performed already in functional tests immediately after the database migrations are rolled up. And this is perhaps one of the most powerful features of my library. An example of usage can be found in demo.

It makes sense to perform checks for unused or missing indexes, as well as for bloat, only on a real database. Collected values ​​can be written to clickhouse or sent to the monitoring system.

I really hope that pg-index-health will be useful and needed. You can also contribute to the development of the library by reporting issues you find and suggesting new diagnostics.

Source: habr.com

Add a comment