An Industrial Approach to PostgreSQL Tuning: Database Experiments. Nikolay Samokhvalov

I propose to read the transcript of the report by Nikolai Samokhvalov "Industrial approach to tuning PostgreSQL: experiments on databases"

Shared_buffers = 25% - is it a lot or a little? Or just right? How do you know if this—rather outdated—recommendation is right for you?

It's time to approach the issue of selecting postgresql.conf parameters "in an adult way". Not with the help of blind "auto-tuners" or outdated advice from articles and blogs, but based on:

  1. strictly verified experiments on the database, performed automatically, in large quantities and under conditions as close as possible to "combat",
  2. deep understanding of the features of the DBMS and OS.

Using Nancy CLI (https://gitlab.com/postgres.ai/nancy), we will consider a specific example - the notorious shared_buffers - in different situations, in different projects and try to figure out how to choose the optimal setting for our infrastructure, database and workload.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

It's about experimenting with databases. This is a story that goes on for a little over six months.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

A little about me. Experience with Postgres for over 14 years. A number of companies founded social networking. Postgres has been and is being used everywhere.

Also RuPostgres group on Meetup, 2nd in the world. We are slowly approaching 2 people. RuPostgres.org.

And in the PCs of various conferences, including Highload, I have been in charge of databases, in particular Postgres since the very beginning.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And in the last few years, I've restarted my Postgres consulting practice in 11 time zones from here.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And when I did this a few years ago, I had some hiatus from active manual work with Postgres, probably since 2010. I was surprised how little the DBA workdays had changed, how much manual labor still needed to be used. And I immediately thought that something was wrong here, we need to automate more of everything.

And since it was all remote, most of the clients were in the clouds. And already a lot is automated obviously. More on this later. That is, all this resulted in the idea that there should be a number of tools, i.e. a certain platform that will automate almost all DBA actions so that you can manage a large number of databases.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

This report will not:

  • Silver bullets and type declarations - go for 8 GB or 25% shared_buffers and you'll be fine. There won't be much about shared_buffers.
  • Hardcore "insides".

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

What will happen?

  • There will be optimization principles that we apply and develop. There will be all sorts of ideas that come our way and different tools that we create for the most part in Open Source, that is, we make the foundation in Open Source. Moreover, we have tickets, all communication is practically in Open Source. You can see what we are doing now, what will be in the next release, etc.
  • And there will also be some experience of using these principles, these tools in a number of companies: from small start-ups to large companies.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

How is it all developing?

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

Firstly, the main task of DBA, in addition to ensuring the creation of instances, deployment of backups, etc., is to find bottlenecks and optimize performance.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

Now it's set up like this. We watch the monitoring, we see something, we lack some details. We begin to dig more carefully, usually with our hands, and understand what to do with it one way or another.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And there are two approaches. pg_stat_statements is the default solution for identifying slow queries. And parsing Postgres logs with pgBadger.

Each approach has serious drawbacks. In the first approach, we have thrown out all the parameters. And if we see groups SELECT * FROM table where the column is equal to the sign "?" or "$" since Postgres 10. We don't know if it's index scan or seq scan. Very much depends on the parameter. You will substitute there seldom met value, there will be index scan. Substitute a value there that occupies 90% of the table, seq scan will be obvious, because Postgres knows the statistics. And this is a big drawback of pg_stat_statements, although some work is underway.

Log analysis has the biggest drawback that you can't afford "log_min_duration_statement = 0" as a rule. And we'll talk about that too. Accordingly, you do not see the whole picture. And some request that is very fast can consume a huge amount of resources, but you won't see it because it's below your threshold.

How do DBAs solve the problems they find?

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

For example, we found some problem. What is usually done? If you are a developer, then you will be doing something on some instance that is not of that size. If you are a DBA, then you have staging. And there can only be one. And he was six months behind. And you think you will go to production. And even experienced DBAs then check it in production, on a replica. And it happens that they create a temporary index, make sure that it helps, drop it and give it to the developers so that they put it in the migration files. This is the kind of shit that's happening right now. And this is a problem.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

  • Tune the configuration.
  • Optimize a set of indexes.
  • Change the SQL query itself (this is the most difficult way).
  • Add capacity (the easiest way in most cases).

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

There are a lot of things with these things. There are many handles in Postgres. You need to know a lot. Many indexes in Postgres, thanks also to the organizers of this conference. And all of this needs to be known, and that's what makes non-DBAs feel like DBAs are doing black magic. That is, you need to study for 10 years to begin to understand all this is normal.

And I am a fighter with this black magic. I want to do everything so that there is technology, and there is no intuition in all this.

Life examples

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

I observed this in at least two projects, including mine. Another blog post tells us that a value of 1 for default_statistict_target is good. Okay, let's try it in production.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And here we are, using our tool two years later, with the help of experiments on databases, which we are talking about today, we can compare what was and what has become.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And for this we need to create an experiment. It consists of four parts.

  • The first is the environment. We need a piece of iron. And when I come to some company and conclude a contract, I say that they give me the same piece of iron as in production. For each of your Masters, I need at least one piece of iron of the same kind. Either this is an instance virtual machine in Amazon or Google, or I need exactly the same piece of hardware. That is, I want to recreate the environment. And we put the major version of Postgres into the concept of environment.
  • The second part is the object of our research. This is a database. It can be created in several ways. I'll show you how.
  • The third part is the burden. This is the most difficult moment.
  • And the fourth part is what we check, that is, what we will compare with what. Let's say we can change one or more parameters in the config, or we can create an index, etc.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

We are starting an experiment. Here is pg_stat_statements. On the left is what happened. On the right - what happened.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

On the left, default_statistics_target = 100, on the right = 1. We can see that this helped us. By 000% overall, things got better.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

But if we scroll down, there will be request groups from pgBadger or from pg_stat_statements. There are two options. We will see that some request dipped by 88%. And then there is the engineering approach. We can dig further inside, because we wonder why he sank. You need to understand what happened with the statistics. Why more buckets in statistics lead to a worse result.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

Or we can not dig, but do “ALTER TABLE ... ALTER COLUMN” and we will return 100 buckets back to the statistics of this column. And further experiment we can make sure that this patch helped. All. This is an engineering approach that helps us see the big picture and make decisions based on data rather than on intuition.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

A couple of examples from other areas. There have been CI tests in tests for many years now. And no project in its right mind will live without automatic tests.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

In other industries: in aviation, in the automotive industry, when we test aerodynamics, we also have the opportunity to make experiments. We will not immediately launch something from the drawing into space, or we will not immediately put some car on the track. For example, there is a wind tunnel.

From observations of other industries, we can draw conclusions.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

First, we have a special environment. It is close to production, but not close. Its main feature is that it should be cheap, repeatable and as automated as possible. And there should be special tools for detailed analysis.

Most likely, when we launched the plane and fly, we have less opportunity to study every millimeter of the wing surface than we have in a wind tunnel. We have more diagnostic tools. We can afford to hang more of everything heavy that we cannot afford to visit the plane in the air. Also with Postgres. We can, in some cases, enable full query logging during experiments. And we do not want to do this in production. We may even include plans to enable this using auto_explain.

And as I said, a high level of automation means that we pressed the button and repeated. This is how it should be, so that there are many experiments, so that it is on stream.

Nancy CLI - the foundation of the "database lab"

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And so we did something like this. That is, I talked about these ideas in June, almost a year ago. And we already have the so-called Nancy CLI in Open Source. This is the foundation for building the database lab.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

Nancy - It's in Open Source, on Gitlab. You can say you can try. I have provided a link in the slides. You can click on it and it will help in all respects.

Of course, there's a lot still in development. There's a lot of ideas out there. But this is what we use almost daily. And when we have an idea - and what is it with delete 40 lines, everything rested on IO, then we can conduct an experiment and look in more detail to understand what is happening and then try to fix it on the go. That is, we are doing an experiment. For example, we twist something and see what happens in the end. And we don't do it in production. This is the essence of the idea.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

Where can it work? This can work locally, i.e. you can do it anywhere, you can even run it on a MacBook. We need a docker, let's go. And that's all. You can run it in some instance in a piece of hardware, or in a virtual machine, anywhere.

And there is also the ability to run remotely in Amazon in EC2 Instance, in spots. And this is a very cool opportunity. For example, yesterday we ran over 500 experiments on i3 instances, starting with the youngest and ending with i3-16-xlarge. And 500 experiments cost us $64. Each lasted 15 minutes. That is, due to the fact that spots are used there, it is very cheap - a 70% discount, Amazon charges per second. You can do a lot. You can do real research.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And three major versions of Postgres are supported. It is not so difficult to finish some old and new 12th version too.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

We can define an object in three ways. This:

  • Dump/sql file.
  • The main way is to clone the PGDATA directory. As a rule, it is taken from the backup server. If you have normal binary backups, you can make clones from there. If you have clouds, then this is a cloud office like Amazon and Google itself will do it for you. This is the most important way for real production clones. This is how we unroll.
  • And the last method is suitable for research, when there is a desire to figure out how some thing works in Postgres. This is pgbench. You can generate with pgbench. It's just one "db-pgbench" option. You tell him what scale. And everything will be generated in the cloud, as it is said.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And load:

  • We can execute the load in one SQL thread. This is the most primitive way.
  • And we can emulate the load. And we can emulate it first of all in the following way. We need to collect all the logs. And it's painful. I'll show you why. And with the help of pgreplay we play, which is built into Nancy.
  • Or another option. The so-called craft load, which we do with a certain amount of effort. Analyzing our current load on the combat system, we pull out the top groups of requests. And with the help of pgbench we can emulate this load in the laboratory.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

  • Or we must execute some SQL, i.e. we check some kind of migration, create an index there, execute ANALAZE there. And we look at what happened before the vacuum and after the vacuum. In general, any SQL.
  • Or we change one or more parameters in the config. We can tell us to check, for example, 100 values ​​in Amazon for our terabyte database. And in a few hours you will have the result. As a rule, a terabyte database will take several hours to deploy. But there is a patch in development, we have a series possible, i.e. you can consistently use the same pgdata on the same server and check. Postgres will restart, caches will be flushed. And you can drive the load.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

  • A directory arrives in which a bunch of all sorts of files, starting from pg snapshotsstat***. And there the most interesting thing is pg_stat_statements, pg_stat_kcacke. These are two extensions that parse queries. And pg_stat_bgwriter contains not only pgwriter statistics, but also checkpoint statistics and how the backends themselves crowd out dirty buffers. And it's all interesting to see. For example, when we set up shared_buffers, it's very interesting to see how much someone has crowded out.
  • Postgres logs are also coming. Two logs - the log of preparation and the log of playing the load.
  • A relatively new feature is FlameGraphs.
  • Also, if you have used pgreplay or pgbench load replay options, their output will be native. And you will see latency and TPS. It will be possible to understand how they saw it.
  • Information about the system.
  • Basic CPU and IO checks. This is more for EC2 instance in Amazon, when you want to run 100 identical instances in a thread and run 100 different runs there, then you will have 10 experiments. And you need to make sure that you do not come across a flawed instance that someone is already oppressing. Others are active on this piece of iron and you have little resource left. It is better to discard such results. And just with the help of sysbench from Alexey Kopytov, we do a few short checks that will come and can be compared with others, that is, you will understand how the CPU behaves and how IO behaves.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

What are the technical difficulties on the example of different companies?

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

Let's say we want to repeat the real load with the help of logs. Great idea if it's written in Open Source pgreplay. We use it. But for it to work well, you must enable full request logging with parameters and timing.

There are some complications about duration and timestamp. We're going to let this kitchen go. The big question is, can you afford it or can't you?

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

https://gist.github.com/NikolayS/08d9b7b4845371d03e195a8d8df43408

The problem is that it may not be available. You must first of all understand which stream will be written to the log. If you have pg_stat_statements, you can use this request (the link will be available in the slides) to understand how many bytes will be written per second.

We are looking at the length of the request. We neglect the fact that there are no parameters, but we know the length of the request and we know how many times per second it was executed. Thus, we can estimate how many bytes per second. We can make a mistake about twice, but we will definitely understand the order in this way.

We can see that 802 times per second this request is executed. And we see that bytes_per sec - 300 kB / s will be written plus or minus. And, as a rule, we can afford such a stream.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

But! The fact is that there are different logging systems. And by default, people usually have "syslog".

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And if you have syslog, then you can have this picture. We will take pgbench, enable request logging and see what happens.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

Without logging - this is the column on the left. We got 161 TPS. With syslog - this is in Ubuntu 000 on Amazon, we get 16.04 TPS. And if we change to two other logging methods, then the situation is much better. That is, we expected that it would sink, but not by the same amount.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And on CentOS 7, in which journald also participates, turning the logs into a binary format for easy searching, etc., then it’s a nightmare there, we sag 44 times in TPS.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And that's what people live with. And often in companies, especially in large ones, it is very difficult to change. If you can get away from syslog, then please get away from it.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

  • Estimate IOPS and write flow.
  • Check your logging system.
  • If the predicted load is excessive, consider sampling.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

We have pg_stat_statements. As I said, it must be. And we can take and describe each group of requests in a special way in a file. And then we can use a very convenient feature in pgbench - this is the ability to slip several files using the "-f" option.

He understands a lot of "-f". And you can say with "@" at the end what share each file should have. That is, we can say that this one is performed in 10% of cases, and this one in 20%. And this will bring us closer to what we see in production.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And how will we understand what we have in production? What share and how? A bit of a sidetrack here. We have another product postgres-checkup. Also a base in Open Source. And now we are actively developing it.

He was born for slightly different reasons. For reasons that monitoring is not enough. That is, you come, look at the base, look at the problems that exist. And, as a rule, you do health_check. If you are an experienced DBA, then you do a health_check. We looked at the use of indexes, etc. If you have OKmeter, then great. This is a cool monitoring for Postgres. OKmeter.io - please install it, everything is very cool there. He is paid.

If you don't have it, then you usually don't have much to eat. In monitoring, there is usually a CPU, IO, and then with reservations, and that's it. And we need more. We need to see how autovacuum works, how checkpoint works, in io we need to separate checkpoint from bgwriter and from backends, etc.

The problem is, when you help some big company, they can't implement something quickly. Can't buy OKmeter quickly. Maybe in six months they will buy it. They cannot quickly deliver some packages.

And we came up with the idea that we need such a special tool that does not require anything to be installed, that is, you do not have to put anything on production at all. Install it on your laptop, or on the observing server, from where you will launch. And it will analyze a lot of things: the operating system, the file system, and Postgres itself, making some light queries that can be run directly on production and nothing will fall.

We named it Postgres-checkup. Medically speaking, it is a regular health check. If in the automotive theme, then it’s like MOT. You do maintenance on the car every six months or a year, depending on the brand. Do you do maintenance for your base? That is, do you do deep research regularly? It must be done. If you make backups, then checkup, this is no less important.

And we have such a tool. He began to actively emerge only about three months ago. He's still young, but there's a lot of stuff out there.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

We collect the most "influential" groups of requests - report K003 in Postgres-checkup

And there is a group of reports K. Three reports so far. And there is such a report K003. There's a top from pg_stat_statements sorted by total_time.

When we sort by total_time groups of requests, then at the top we see a group that loads our system the most, i.e. consumes more resources. Why do I name query groups? Because we threw out the parameters. These are no longer queries, but groups of queries, i.e. they are abstracted.

And if we optimize from top to bottom, we will lighten our resources and delay the moment when we need to upgrade. This is a very good way to save money.

Maybe this is not a good way to take care of users, because we may not see rare, but very annoying cases when a person waited 15 seconds. In sum, they are so rare that we do not see them, but we are engaged in resources.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

What happened in this table? We took two snapshots. Postgres_checkup will give you a delta for each metric: for total-time, calls, rows, shared_blks_read, etc. That's it, I calculated the delta. The big problem with pg_stat_statements is that it doesn't remember when it was reset. If pg_stat_database remembers, then pg_stat_statements does not. You see that there is 1 number, but we don’t know where we counted from.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And here we know, here we have two snapshots. We know that the delta was 56 seconds in this case. Very small gap. Sorted by total_time. And then we can differentiate, i.e. we divide all the metrics by duration. If we divide each metric by duration, we will have the number of calls per second.

Next, total_time per second is my favorite metric. It is measured in seconds, per second, that is, how many seconds it took our system to complete this group of requests per second. If you see more than a second per second there, it means that you needed to give more than one core. This is a very good metric. You can understand that this friend, for example, needs at least three cores.

This is our know-how, I have not seen this anywhere. Pay attention - this is a very simple thing - a second per second. Sometimes, when your CPU is 100%, then half an hour per second, i.e. you spent half an hour only doing these queries.

Further we see rows per second. We know how many rows per second it returned.

And then another interesting thing. How many shared_buffers per second we read from shared_buffers itself. The hits were already there, and we took the rows from the operating system cache, or from the disk. The first option is fast, while the second may or may not be fast, depending on the situation.

And the second way of differentiation - we divide the number of requests in this group. In the second column, you will always have one request split per request. And then it’s interesting - how many milliseconds were in this request. We know how this query behaves on average. 101 milliseconds were required for each request. This is the traditional metric we need to understand.

How many rows each query returned on average. We see 8 this group returns. How much, on average, was taken from the cache and read. We see that everything is cached nicely. Solid hits for the first group.

And the fourth substring in each line is how many percent of the total. We have calls. Let's say 1. And we can understand how much this group contributes. We see that in this case the first group contributes less than 000%. That is, it is so slow that we do not see it in the big picture. And the second group - 000% on calls. That is, 0,01% of all calls are the second group.

By total_time is also interesting. We spent 14% of the total time on the first group of requests. And for the second - 11%, etc.

I will not go into details, but there are subtleties. We display an error from above, because when we compare, snapshots can float, i.e., some requests may fall out and in the second one they cannot but be present, and some new ones may appear. And we calculate the error there. If you see 0, then that's good. This is no error. If the error rate is up to 20%, it is OK.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

We then return to our topic. We have to craft the workload. We take from top to bottom and go until we reach 80% or 90%. Usually it is 10-20 groups. And we make files for pgbench. We use random there. Sometimes, unfortunately, this doesn't work. And in Postgres 12 there will be more options to use this approach.

And then we gain 80-90% of total_time in this way. What next to substitute after "@"? We look at calls, look at how many percent and understand that we owe so many percent here. From these percentages, we can figure out how to balance each of the files. After that, we use pgbench and go to work.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

We also have K001 and K002.

K001 is one big string with four substrings. This is a characteristic of our entire load. See second column and second substring. We see that one and a half seconds per second is approximately, that is, if there are two cores, then it will be good. There will be approximately 75% loading. And that's how it will work. If we have 10 cores, then we will generally be calm. So we can evaluate resources.

K002 is what I call the query classes, i.e. SELECT, INSERT, UPDATE, DELETE. And separately SELECT FOR UPDATE, because it locks.

And here we can conclude that SELECT ordinary readers - 82% of all calls, but at the same time - 74% of total_time. That is, they are called a lot, but consume the resource less.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And we return to the question: “How do we choose the right shared_buffers?”. I observe that most benchmarks are built on the idea - let's see what the throughput will be, that is, what the throughput will be. It is usually measured in TPS or QPS.

And we try to squeeze out as many transactions per second as possible from the car with the help of tuning parameters. Here just 311 per second for select.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

But no one drives to work and back home in a car at full speed. This is silly. So it is with databases. We shouldn't drive at full speed, and no one does. Nobody lives in production, which has 100% CPU. Although, maybe someone lives, but it's not good.

The idea is that we usually drive at 20 percent of the possibility, preferably not higher than 50%. And we try to optimize the response time for our users first of all. That is, we must turn our knobs so that there is a minimum latency at 20% speed, conditionally. This is an idea that we also try to use in our experiments.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

And at the end of the recommendation:

  • Be sure to make Database Lab.
  • If possible, make it on demand so that it unfolds for a while - they played it and threw it away. If you have clouds, then it goes without saying, i.e. have a lot of standing.
  • Be curious. And if something is wrong, then check with experiments how it behaves. Nancy can be used to educate yourself to check how the base works.
  • And aim for the minimum response time.
  • And don't be afraid of Postgres sources. When you work with sources, you must know English. There are a lot of comments there, everything is explained there.
  • And check the health of the database regularly, at least once every three months manually, or Postgres-checkup.

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

Questions

Thanks a lot! A very interesting thing.

Two pieces.

Yes, two. Only I didn't quite get it. When we work with Nancy, can we tweak only one parameter or the whole group?

We have a delta config parameter. You can spin there as much as you want at once. But you need to understand that when you change a lot of things, you can draw the wrong conclusions.

Yes. Why did I ask? Because it's hard to experiment when you only have one parameter. You twist it, see how it works. Exposed him. Then you start the next one.

You can twist at the same time, but it depends on the situation, of course. But it is better to check one idea. We had an idea yesterday. We were in a very close situation. There were two configs. And we couldn't figure out why the big difference. And the idea arose that you need to use a dichotomy in order to consistently understand and find the difference. You can immediately make half of the parameters the same, then a quarter, etc. Everything is flexible.

And there is another question. The project is young and developing. The documentation is already ready, is there a detailed description?

I made a special link there on the description of the parameters. It is. But a lot is still missing. I'm looking for like-minded people. And I find them when I perform. This is very cool. Someone is already working with me, someone helped and did something there. And if you are interested in this topic, give feedback - what is missing.

As we make a laboratory, maybe there will be feedback. Let's see. Thank you!

Hello! Thanks for the report! I saw that there is Amazon support. Is GSP support planned?

Good question. We started doing. And so far they have frozen, because we want to save. That is, there is support using run on localhost. You can create an instance yourself and work locally. By the way, this is how we do it. I do this in Getlab, there on GSP. But we still don’t see the point in doing just such an orchestration, because Google doesn’t have cheap spots. There is ??? instances, but they have limitations. Firstly, they always only have a 70% discount and you can’t play with the price there. Spots, we increase the price by 5-10% to reduce the likelihood that you will be killed. That is, you save spots, but you can be taken away at any time. If you bid a little higher than others do, you will be killed later. Google is completely different. And there is another very bad limitation - they only live 24 hours. And sometimes we want to run an experiment for 5 days. But you can do it in spots, spots sometimes live for months.

Hello! Thanks for the report! You mentioned checkup. How do you calculate stat_statements errors?

A very good question. I can show and tell in great detail. In short, we are looking at how the set of request groups has floated: how many have fallen off and how many new ones have appeared. And then we look at two metrics: total_time and calls, so there are two errors. And we look at the contribution of the floating groups. There are two subgroups: those who left and those who arrived. Let's see how they contribute to the big picture.

Aren't you afraid that it will turn there two or three times during the time between snapshots?

I mean, did they re-register or what?

For example, this request has already been pushed out once, then it came and was pushed out again, then it came again and was pushed out again. And you calculated something here, and where is it all?

Good question to look into.

I did a similar thing. Easier, of course, I did it alone. But I had to reset, reset stat_statements and be aware at the time of the snapshot that there is less than a certain percentage, which still did not reach the ceiling, how much stat_statements can accumulate there. And I am guided that, most likely, nothing was forced out.

Yes Yes.

But I don't understand how to do it differently.

Unfortunately, I don’t remember exactly whether we use the query text or queryid with pg_stat_statements there and focus on it. If we are guided by the queryid, then in theory, we are comparing comparable things.

No, he can be forced out several times between snapshots and come again.

With the same id?

Yes.

We will study this. Good question. Need to study. But for now, what we see, we either write 0 ...

This is, of course, a rare case, but I was shocked when I found out that stat_statemetns could preempt there.

There can be many things in Pg_stat_statements. We have encountered the fact that if you have track_utility = on, then your sets are also tracked.

Yes, of course.

And if you have java hibernate, which is random, then the hash table starts locking there. And as soon as you turn off a very busy application, you have 50-100 groups. And everything is more or less stable there. One way to combat this is to increase pg_stat_statements.max .

Yes, but you have to know how much. And somehow you have to follow it. I do. i.e. I have pg_stat_statements.max. And I see that at the time of the snapshot I did not reach 70% percent. Okay, so we haven't lost anything. We do a reset. And we copy again. If the next snapshot has less than 70, then most likely nothing has been lost again.

Yes. The default is now 5. And for many, this is enough.

Usually yes.

Video:

PS I’ll add on my own that if Postgres contains confidential data and it can’t get into the test environment, then you can use PostgreSQL Anonymizer. The scheme is roughly as follows:

Industrial Approach to PostgreSQL Tuning: Database Experiments". Nikolay Samokhvalov

Source: habr.com

Add a comment