Upgrade for the lazy: how PostgreSQL 12 improves performance

Upgrade for the lazy: how PostgreSQL 12 improves performance

PostgreSQL 12, the latest release of "the world's best open source relational database", is coming out in a couple of weeks (if all goes according to plan). This follows the usual schedule - a new version with a lot of new features comes out once a year, and, frankly, it's impressive. That's why I became an active member of the PostgreSQL community.

In my opinion, unlike past releases, PostgreSQL 12 does not contain one or two revolutionary features (such as partitioning or query parallelism). I once joked that the main feature of PostgreSQL 12 is greater stability. Isn't that what you need when you manage your business' critical data?

But PostgreSQL 12 is not limited to this: with new features and improvements, applications will work better, All you need to do is upgrade!

(Well, maybe even rebuild indexes, but in this release it's not as scary as we're used to.)

It would be great to upgrade PostgreSQL and immediately enjoy significant improvements without unnecessary gestures. Several years ago, I analyzed the upgrade from PostgreSQL 9.4 to PostgreSQL 10 and saw how much faster the application was due to the improved query parallelism in PostgreSQL 10. And, most importantly, almost nothing was required of me (just set the configuration parameter max_parallel_workers).

Agree, it is convenient when applications work better immediately after the upgrade. And we try very hard to please users, because PostgreSQL has more and more of them.

And how does a simple upgrade to PostgreSQL 12 make you happy? Now I'll tell you.

Major indexing improvements

Without indexing, the database will not go far. How else can you find information quickly? The fundamental PostgreSQL indexing system is called B-tree. This type of index is optimized for storage systems.

We just use the operator CREATE INDEX ON some_table (some_column), and PostgreSQL does a great job of keeping the index up to date while we are constantly inserting, updating, and deleting values. Everything works by itself, like magic.

But PostgreSQL indexes have one problem - they bloated and take up extra disk space, and the performance of extracting and updating data is reduced. By "bloat" I mean inefficient maintenance of the index structure. This may or may not be related to the garbage tuples that VACUUM (thanks to Peter Gagan for the info)Peter Geoghegan)). Index bloat is especially noticeable in workloads where the index is actively changing.

PostgreSQL 12 greatly improves the performance of B-tree indexes, and experiments with tests like TPC-C have shown that space is now used, on average, 40% less. Now we spend less time not only maintaining the B-tree indexes (that is, writing operations), but also retrieving data, because the indexes have become much smaller.

Applications that actively update their tables are typically OLTP applications (real-time transaction processing) will be much more efficient in terms of disk usage and query processing. The more disk space, the more room the database has to grow without infrastructure upgrades.

Some upgrade strategies require you to rebuild B-tree indexes to take advantage of these benefits (for example, pg_upgrade will not rebuild indexes automatically). In previous versions of PostgreSQL, rebuilding large indexes on tables resulted in significant downtime because no changes could be made during that time. But PostgreSQL 12 has another cool feature: now you can rebuild indexes in parallel with the command REINDEX CONCURRENTLYto completely avoid downtime.

PostgreSQL 12 has other improvements to the indexing infrastructure. Another thing where there was some magic - write-ahead log, aka WAL (write-ahead log). The write-ahead log writes every transaction to PostgreSQL in case of failure and replication. Applications use it for archiving and point-in-time recovery. Of course, the write-ahead log is written to disk, and this can affect performance.

PostgreSQL 12 has reduced the overhead of WAL records that are created by GiST, GIN, and SP-GiST indexes when an index is built. This has several tangible benefits: WAL records take up less disk space, and data is replayed faster, such as during failover or point-in-time recovery. If you use such indexes in your applications (for example, PostGIS-based geospatial applications use the GiST index a lot), this is another feature that will greatly improve performance without any effort on your part.

Partitioning - Bigger, Better, Faster

PostgreSQL 10 introduced declarative partitioning. In PostgreSQL 11, it has become much easier to use. In PostgreSQL 12, you can scale partitions.

In PostgreSQL 12, the performance of the partitioning system has improved significantly, especially when there are thousands of partitions in a table. For example, if a query affects only a few partitions in a table with thousands of them, it will run much faster. Performance improvements are not limited to these types of queries. You will also notice how much faster INSERT operations are on tables with many partitions.

Writing data using COPY - by the way, this is a great way bulk data upload and here is an example receiving JSON - to partitioned tables in PostgreSQL 12 has also become more efficient. Everything was fast with COPY, but in PostgreSQL 12 it flies completely.

These benefits make it possible for PostgreSQL to store even larger datasets and make them easier to retrieve. And no effort on your part. If the application has many sections, for example, it writes time series data, a simple upgrade will significantly improve its performance.

And while this is not exactly an upgrade-and-rejoice improvement, in PostgreSQL 12 you can create foreign keys that refer to partitioned tables to make working with partitioning a pleasure.

WITH queries just got a lot better

When a patch for inline common table expressions has been applied (aka CTE, aka WITH queries), I was itching to write an article about how how delighted application developers were with PostgreSQL. This is one of those features that will speed up the application. Unless, of course, you are using CTE.

I often notice that SQL beginners love to use CTEs: if you write them in a certain way, you feel like you are writing an imperative program. Personally, I liked to rewrite these queries to get around without CTE and increase productivity. Now everything is different.

PostgreSQL 12 allows you to inline a specific CTE type without side effects (SELECT), which is used only once near the end of the request. If I kept track of the CTE queries I rewrote, most of them would fall into this category. This helps developers write clear code that is now also fast.

Moreover, PostgreSQL 12 optimizes SQL execution itself, you don't have to do anything. While I probably won't need to optimize such queries now, it's great that PostgreSQL continues to work on query optimization.

Just-in-Time (JIT) - now the default

On PostgreSQL 12 systems with support LLVM JIT compilation is enabled by default. First, you get support JIT for some internal operations, and secondly, queries with expressions (the simplest example is x + y) in select lists (which you have after SELECT), aggregates, expressions with WHERE clauses, and others can use JIT to improve performance.

Since JIT is enabled by default in PostgreSQL 12, performance will improve on its own, but I suggest testing the application in PostgreSQL 11, where JIT was first introduced, to measure query performance and see if anything needs tweaking.

But what about the rest of the new features of PostgreSQL 12?

PostgreSQL 12 has a ton of cool new features, from the ability to inspect JSON data using standard SQL/JSON route expressions to multi-factor authentication with a clientcert=verify-full, generated columns, and more. Enough for a separate post.

Like PostgreSQL 10, PostgreSQL 12 will improve overall performance immediately after the upgrade. Of course, you can have your own way - test the application under similar conditions on a production system before enabling improvements, as I did with PostgreSQL 10. Even if PostgreSQL 12 is already more stable than I expected, do not be lazy to test applications well, before releasing them into production.

Source: habr.com

Add a comment