Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

Transcription of the 2015 report by Ilya Kosmodemyansky "Linux tuning to improve PostgreSQL performance"

Disclaimer: I note that this report is dated November 2015 - more than 4 years have passed and a lot of time has passed. Version 9.4 discussed in the report is no longer supported. Over the past 4 years, there have been 5 new releases of PostgreSQL and 15 versions of the Linux kernel. If you rewrite these places, you will end up with a different report. But here is a fundamental Linux tuning for PostgreSQL, which is still relevant today.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky


My name is Ilya Kosmodemyansky. I work for PostgreSQL-Consulting company. And now I will talk a little about what to do with Linux in relation to databases in general and PostgreSQL in particular, because the principles are quite similar.

What will be discussed? If you are dealing with PostgreSQL, then you need to be a UNIX admin to some extent. What does it mean? If we compare Oracle and PostgreSQL, then in Oracle you need to be 80% DBA database admin and 20% Linux admin.

PostgreSQL is a little more difficult. With PostgreSQL, you need to have a much better idea of ​​how Linux works. And at the same time, run a little after the locomotive, because lately everything has been updated pretty cool. And new cores come out, and new functionality appears, performance improves, etc.

Why are we talking about Linux? Not at all because we are at the Linux conference Peter, but because in modern conditions one of the most justified operating systems for operating with databases in general and with PostgreSQL in particular is Linux. Because FreeBSD, unfortunately, is developing in some very strange direction. And there will be problems with both performance and many other things. The performance of PostgreSQL on Windows is generally a separate harsh topic, resting on the fact that Windows does not have such shared memory as UNIX, and PostgreSQL is all about this business, because it is a multi-process system.

And exotics like Solaris, I think, are of less interest to everyone, so let's go.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

A modern Linux distribution has over 1 syctl options, depending on how the kernel is built. At the same time, if we look at different nuts, then there can still be many ways to adjust something. There are file system options on how to mount. If you have questions about how to start: what to enable in the BIOS, how to configure the hardware, etc.

This is a very large volume, which can be talked about for several days, and not in one short report, but I will now focus on important things, how to avoid those rakes that will not allow you to operate a database on Linux well if you do not fix them. . And at the same time, an important point is that many default parameters are not included in the settings that are correct for the database. That is, by default it will work badly or will not work at all.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

What are the traditional tuning targets on Linux? I think that since you are all dealing with Linux administration, there is no need to explain what targets are.

You can tune:

  • CPU.
  • memory.
  • storage.
  • other. We will talk about this at the end for a snack. Even, for example, settings such as power saving policy can affect performance in a very unpredictable and not very pleasant way.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

What are the specifics of PostgreSQL and the database in general? The problem is that you can’t tweak some particular nut and see that our performance has improved a lot.

Yes, there are such gadgets, but the database is a complicated thing. She interacts with all the resources that the server has and prefers to interact in full. If you look at Oracle's current guidelines on how to use a host OS, it's like that Mongolian astronaut's joke - feed the dog and don't touch anything. Let's give the database all the resources, the database itself will destroy everything.

In principle, to some extent, the situation is exactly the same with PostgreSQL. The difference lies in the fact that the base is also not able to take all the resources for itself, that is, somewhere at the Linux level you need to sort it all out on your own.

The main idea is not to choose a single target and start tuning it, for example, memory, CPU or something like that, but to analyze the workload and try to improve the throughput as much as possible so that the load that good programmers created for us, including our users.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

Here is a picture to explain what it is. There is a Linux OS buffer and there is shared memory and there are PostgreSQL shared buffers. PostgreSQL, unlike Oracle, works directly only through the kernel buffer, that is, in order for a page from disk to get into its shared memory, it must go through the kernel buffer and back exactly the same situation.

Disks live under this system. I drew it as disks. In fact, there may be a RAID controller, etc.

And this input-output one way or another happens through this case.

PostgreSQL is a classic database. It's inside the page. And all input-output occurs with the help of pages. We raise blocks in memory by pages. And if nothing happened, we just read them, then gradually they sink from this cache, from shared buffers and get back to the disk.

If we have replaced something somewhere, then our entire page is marked as dirty. I marked them in blue here. And this means that this page must be synchronized with block storage. That is, when we made it dirty, we made an entry in WAL. And at some fine point in time, a phenomenon called checkpoint came. And this log recorded information that he came. And this means that all the dirty pages that were here at that moment in these shared buffers were synchronized with the storage disk using fsync through the kernel buffer.

What is it for? If we lost voltage, then we did not get the situation that all data was lost. Persistent memory, which everyone told us about, is so far in database theory - this is a bright future, which we, of course, strive for and we like it, but so far they still live in minus 20 years. And, of course, all this needs to be monitored.

And the task of maximizing throughput is to tune at all these stages so that it all goes back and forth quickly. Shared memory is basically a page cache. In PostgreSQL, we sent a select something there request, it got this data from the disk. They got into shared buffers. Accordingly, for this to work better, there must be a lot of memory.

In order for all this to work well and quickly, you need to correctly configure the operating system at all stages. And choose balanced iron, because if you have an imbalance in some place, then you can make a lot of memory, but it will be served at insufficient speed.

Let's go through each of these points.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

In order for these pages to travel back and forth faster, you need to achieve the following:

  • First, you need to work more efficiently with memory.
  • Secondly, this transition should be more efficient when pages from memory go to disk.
  • And, thirdly, there must be good discs.

If you have 512 GB of RAM in the server and all this ends up on a SATA hard drive without any cache, then the entire database server turns not just into a pumpkin, but into a pumpkin with a SATA interface. You will run into it directly. And nothing will save you.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

As for the first point with memory, there are three things that can make life very difficult.

The first one is NUMA. NUMA is a thing that is made to improve performance. Depending on the workload, you can optimize different things. And in its new current form, it is not very good for applications such as a database that intensively use page cache shared buffers.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

In a nutshell. How to understand that something is wrong with NUMA? You have some kind of unpleasant knock, suddenly some CPU is overloaded. At the same time, you analyze queries in PostgreSQL and see that there is nothing similar there. These requests should not be so CPU intensive. You can catch it for a long time. It's easier to use the right advice from the start on how to set up NUMA for PostgreSQL.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

What is really going on? NUMA stands for Non-Uniform Memory Access. What is the point? You have a CPU, next to it there is its local memory. And this memory interconnects can pull memory from other CPUs.

If you run numactl --hardware, then you will get such a big sheet. Among other things, there will be a distances field. There will be numbers - 10-20, something like that. These numbers are nothing but the number of hops to pick up this remote memory and use it locally. Basically a good idea. This improves performance well in a number of workloads.

Now imagine that you have one CPU first trying to use its local memory, then trying to pull up another memory via interconnect for something. And your entire PostgreSQL page cache gets to this CPU - that's it, how many gigabytes are there. You always get the worst case because there is usually little memory on the CPU directly in that module. And all the memory that is served goes through these interconnects. It turns out slowly and sadly. And you have a processor that serves this node is constantly overloaded. And the access time of this memory is bad, slow. This is the kind of situation you don't want if you're using this case for a database.

Therefore, a more correct option for the database is that the Linux operating system does not know at all what is happening there. So that she addresses the memory as she addresses.

Why is that? It would seem that it should be the other way around. This happens for one simple reason, that we need a lot of memory for page cache - tens, hundreds of gigabytes.

And if we allocate all this and cached our data there, then the gain from using the cache will be significantly greater than the gain from such a cunning memory access. And in this way we will gain incomparably compared to the fact that we will access memory more efficiently using NUMA.

Therefore, there are two approaches at the moment, until a bright future has come, and the database itself cannot figure out which CPUs it works on and where it needs to pull something from.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

Therefore, the correct approach is to disable NUMA altogethere.g. on reboot. In most cases, the winnings are in such orders that there is no question at all, which is better.

There is another option. We use it more often than the first one, because when a client comes to us for support, then for him to restart the server is a whole thing. He has a business there. And they experience problems because of NUMA. Therefore, we try to disable it in less invasive ways than reboot, but here be careful to check that it is disabled. Because, as experience shows, that we disable NUMA on the parent process of PostgreSQL, this is good, but it is not at all necessary that this will work. We need to check and see that she really turned off.

There is a good post by Robert Haas. This is one of the PostgreSQL committers. One of the key developers of all low-level giblets. And if you follow the links from this post, it describes several colorful stories about how NUMA made life difficult for people. Look, study the system administrator's checklist of what needs to be configured on the server in order for our database to work well. These settings need to be recorded and checked, because otherwise it will not be very good.

I draw your attention to the fact that this applies to all the settings that I will talk about. But usually databases are assembled in master-slave mode for fault tolerance. Don't forget to make these settings on the slave, because one day you will have an accident and you will switch to the slave and it will become the master.

In an emergency, when everything is very bad, your phone constantly rings and your boss comes running with a big stick, you will have no time to think about checking. And the results can be very disastrous.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

The next moment is huge pages. Huge pages are difficult to test separately, and there is no point in this, although there are benchmarks that can do this. They are easily googled.

What's the point? You have a not very expensive server that has a lot of RAM, for example, more than 30 GB. You are not using huge pages. This means that you definitely have an overhead on memory usage. And this overhead is far from the most pleasant.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

Why is that? And what's going on? The operating system allocates memory in small chunks. So convenient, so historically. And if you go into details, then the OS must translate virtual addresses into physical ones. And this process is not the easiest, so the OS caches the result of this operation in the Translation Lookaside Buffer (TLB).

And since the TLB is a cache, then in this situation all the problems inherent in the cache arise. Firstly, if you have a lot of RAM and it is all allocated in small chunks, then this buffer becomes very large. And if the cache is large, then it is slower to search for it. Overhead is healthy and takes up space on its own, i.e. something wrong is consuming RAM. This time.

Two - the more the cache grows in such a situation, the more likely it is that you will have cache misses. And the efficiency of this cache drops rapidly as its size grows. So operating systems came up with a simple approach. Linux has been using it for a long time. It appeared in FreeBSD not so long ago. But we're talking about Linux. These are huge pages.

And here it should be noted that huge pages, as an idea, were initially pushed through by communities that included Oracle and IBM, that is, database manufacturers thought hard about the fact that this would be useful, including for databases.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

And how to make friends with PostgreSQL? First, huge pages must be enabled in the Linux kernel.

Secondly, they must be explicitly specified by the sysctl parameter - how many there are. The numbers here are from some old server. You can calculate approximately how many shared buffers you have so that huge pages fit there.

And if you have the entire server dedicated to PostgreSQL, then a good starting point is to either give 25% of RAM for shared buffers, or 75% if you are sure that your database will definitely fit in these 75%. Starting point first. And consider, if you have 256 GB of RAM, then, accordingly, you will have 64 GB of sherd buffers. Calculate approximately with some margin - what you should have this figure set to.

Before version 9.2 (if I'm not mistaken, since version 8.2) it was possible to make friends with huge pages PostgreSQL using a third-party library. And this should always be done. First, you need the kernel to be able to allocate huge pages correctly. And, secondly, so that the application that works with them can use them. It just won't be used that way. Since PostgreSQL allocated memory in a system 5 style, this could be done using libhugetlbfs - this is the full name of the library.

9.3 improved PostgreSQL memory performance and dropped the system 5 memory allocation method. Everyone was very happy, because otherwise you try to run two PostgreSQL instances on the same machine, and he says that I don’t have enough shared memory. And he says that you need to fix sysctl. And there is such a sysctl that you still need to reboot, etc. In general, everyone was delighted. But mmap memory allocation broke using huge pages. Most of our clients use large shared buffers. And we strongly recommended not to switch to 9.3, because there overhead began to be calculated in good percentages.

But on the other hand, the community drew attention to this problem and in 9.4 they reworked this event very well. And in 9.4, a parameter appeared in postgresql.conf, in which you can turn on try, on or off.

Try is the most secure option. When PostgreSQL starts, when it allocates shared memory, it tries to grab this memory from huge pages. And if it doesn’t work, then it rolls back to the usual selection. And if you have FreeBSD or Solaris, then you can put try, it's always safe.

If on, then it simply does not start if it could not select from huge pages. Here already - to whom and what is more cute. But if you have a try, then check that you really have what you need highlighted, because there are a lot of spaces for an error. Currently, this functionality only works on Linux.

One more little note before we move on. Transparent huge pages are not about PostgreSQL yet. He cannot use them normally. And with Transparent huge pages for such a workload, when you need a large piece of shared memory, the pluses come only with very large volumes. If you have terabytes of memory, then this may play a role. If we are talking about more everyday applications, when you have 32, 64, 128, 256 GB of memory on the machine, then the usual huge pages are Ok, and we just turn off Transparent.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

And the last thing about memory is not directly related to fruput, it can ruin life very much. All throughput will be greatly affected by the fact that the server is constantly swapping.

And it will be very unpleasant at some points. And the main trouble is that in modern kernels the behavior is slightly different from older Linux kernels. And this thing, which is rather unpleasant to step on, because when we talk about some work with swap, it ends with the untimely arrival of the OOM-killer. And the OOM-killer, which did not come in time and threw off PostgreSQL, is unpleasant. Everyone will know about it, that is, until the last user.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

What's happening? You have a large amount of RAM there, everything works well. But for some reason the server hangs in swap and slows down because of this. It would seem that there is a lot of memory, but it happens.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

Previously, we advised vm.swappiness to be set to zero, i.e. disable swap. Previously, it seemed that 32 GB of RAM and corresponding shared buffers was a huge amount. The main purpose of the swap is to have a place to throw a crust if we fall off. And it hasn't been done very well. And then what will you do with this crust? This is already such a task when it is not very clear why swap is needed, especially of such a size.

But in more modern, i.e., in the third versions of the kernel, the behavior has changed. And if you set swap to zero, i.e. turn it off, then sooner or later, even with some RAM left, an OOM-killer will come to you to kill the most intensive consumers. Because he will consider that with such a workload we still have a little bit left and we will jump out, that is, not kill the system process, but kill something less important. This less important will be the heavy consumer of shared memory, namely the postmaster. And after that it will be good if the base does not have to be restored.

Therefore, now the default, as far as I remember, most distributions are somewhere around 6, i.e. at what point to start using swap, depending on how much memory is left. We now advise setting vm.swappiness = 1, because it practically turns it off, but does not give such effects as with an unexpected OOM-killer that came and killed the whole thing.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

What's next? When we talk about the performance of databases and gradually, gradually, we are like disks, everyone starts to grab their heads. Because the truth that the disk is slow and memory is fast has been familiar to everyone since childhood. And everyone knows that there will be disk performance issues in the database.

The main PostgreSQL performance problem with checkpoints spikes is not because the disk is slow. This is more likely due to the fact that the memory and disk bandwidth are not balanced. However, they may not be balanced in different places. PostgreSQL is not configured, OS is not configured, hardware is not configured and hardware is wrong. And this problem does not happen only if everything goes as it should, i.e. either there is no load, or the settings and hardware are well-chosen.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

What is it and what does it look like? Usually, people who work with PostgreSQL have entered into this business more than once. I'll explain. As I said, PostgreSQL periodically makes checkpoints to dump dirty pages in shared memory to disk. If we have a large amount of shared memory, then checkpoint begins to intensively affect the disk, because fsync dumps these pages. It arrives in the kernel buffer and is written to disk using fsync. And if the volume of this case is large, then we can observe an unpleasant effect, namely, a very large utilization of disks.

Here I have two pictures. I will now explain what it is. These are two time-correlated graphs. The first graph is disk utilization. Here it reaches almost 90% at this point in time. If you have a database drop with physical disks, with a RAID controller under 90% utilization, then this is bad news. This means that a little more and 100 will come and the input / output will stop.

If you have a disk array, then there is a slightly different story. There it depends on how it is configured, what kind of array, etc.

And in parallel, a graph is configured here from the internal postgres view, which tells how the checkpoint occurs. And the green color here shows how many buffers of these dirty pages at that moment arrived at this checkpoint for synchronization. And this is the main thing to know here. We see that we have a lot of pages here and at some point we ran into a fee, that is, we wrote and wrote, here the disk system is clearly very busy. And our checkpoint has a very strong effect on the disk. Ideally, the situation should look more like this, i.e. we had less of a record here. And we can fix it with settings so that it continues like this. That is, the recycling is small, but somewhere we write something here.

What needs to be done to overcome this problem? If you have stopped IO under the database, then this means that all users who came to execute their requests will be waiting.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

If you look from the point of view of Linux, if you took good hardware, configured it correctly, configured PostgreSQL normally so that it would make these checkpoints less often, spread them in time between each other, then you step into the default Debian parameters. For most Linux distributions, this is the picture: vm.dirty_ratio=20, vm.dirty_background_ratio=10.

What does it mean? Since kernel 2.6, one demon flushing has appeared. Pdglush, depending on who uses what, which deals with the background throwing off dirty pages from the kernel buffer and throwing off dirty pages when it is necessary, no matter what, when the backgrouind throwing does not help.

When does background come? When 10% of the total RAM that is on the server is occupied by dirty pages in the kernel buffer, then a special cheat function in the background is called. Why is she background? It takes as a parameter how many pages to write off. And, let's say, writes off N pages. And for a while, this thing falls asleep. And then she comes back and writes off some more pages.

This is an extremely simple story. Here the task is like with a pool, when it pours into one pipe, pours into another. Our checkpoint came and if it sent a few dirty pages for discarding, then gradually from the kernel buffer pgflush this whole thing will neatly resolve.

If these dirty pages continue to accumulate, they accumulate up to 20%, after that the OS priority is to write off the whole thing to disk, because the power will fly out, and everything will be bad for us. We will lose this data, for example.

What's the trick? The trick is that these parameters in the modern world of 20 and 10% of all the RAM that is on the machine, they are absolutely monstrous in terms of the throughput of any disk system that you have.

Imagine that you have 128 GB of RAM. 12,8 GB comes to your disk system. And no matter what cache you have there, no matter what array you have there, they will not withstand so much.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

Therefore, we recommend that these numbers be adjusted immediately depending on the capabilities of your RAID controller. I immediately gave a recommendation here for a controller that has 512 MB of cache.

Everything is considered very simple. You can put vm.dirty_background in bytes. And these settings override the previous two. Either the ratio is by default, or those with bytes are activated, then those with bytes will work. But since I am a DBA consultant and I work with different clients, I try to lay straws and therefore, if in bytes, then in bytes. No one gave any guarantee that a good admin would not add memory to the server, would not reboot it, and the figure would remain the same. Just calculate these numbers so that everything fits there with a guarantee.

What happens if you don't fit in? I have written that effectively stops any flushing, but in fact it is a figure of speech. The operating system has a big problem - it has a lot of dirty pages, so the IO that your clients generate effectively stops, i.e. the application has come to send a sql query to the database, it is waiting. Any I/O to it is in the lowest priority, because the base is occupied by the checkpoint. And when she finishes it is completely incomprehensible. And when you have reached non-background, non-background flushing, it means that all your IO is occupied by it. And until it ends, you will not do anything.

There are two more important points that are beyond the scope of this report. These settings should match the settings in postgresql.conf, i.e. the checkpoints settings. And your disk system must be adequately configured. If you have a cache on the RAID, then it must have a battery. People buy RAID with good cache without battery. If you have an SSD in RAID, then they must be server ones, there must be capacitors. Here is the expanded checklist. At this link there is my report on how to set up disk performance in PostgreSQL. All those checklists are there.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

What else can make life very difficult? These are two options. They are relatively new. By default, they can be included in different applications. And they can complicate life just as much if they are turned on incorrectly.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

There are two relatively new pieces. They have already appeared in the third cores. These are sched_migration_cost in nanoseconds and sched_autogroup_enabled which is one by default.

And how do they spoil life? What is sched_migration_cost? The Linux scheduler can migrate a process from one CPU to another. And for PostgreSQL, which executes queries, migrating to another CPU is completely incomprehensible why. From an operating system point of view, when you switch windows between openoffice and terminal, this may be fine, but for the database - it is very bad. Therefore, a reasonable policy is to set migration_cost to some large value, at least a few thousand nanoseconds.

What will this mean for the scheduler? It will be assumed that during this time this process is still hot. That is, if you have some kind of long transaction doing something for a long time, the scheduler will understand this. He will assume that until this timeout passes, then this process does not need to be migrated anywhere. If at the same time the process does something, then it will not be migrated anywhere, it will calmly finish on the CPU that was allocated to it. And the result is excellent.

The second point is autogroup. There is a good idea for specific workloads that are not related to modern databases - this is to group processes by the virtual terminal from which they are launched. It is convenient for some tasks. In practice, PostgreSQL is a prefork multi-process system that runs from a single terminal. You have a lock writer, a checkpoint, and all your client requests are grouped into one scheduler, per CPU. And they will wait together there when he is free, in order to interfere with each other and keep him busy longer. This is a story that is completely unnecessary in the case of such a load and therefore it should be turned off.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

My colleague Alexey Lesovsky did tests with a simple pgbench, where he increased migration_cost by an order of magnitude and turned off autogroup. The difference on a bad piece of iron turned out to be almost 10%. There is a discussion on the postgres mailing list where people report results like similar changes to query speed influenced 50%. There are quite a few such stories.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

And finally, about the power saving policy. It's good that Linux can now be used on a laptop. And it will supposedly consume the battery well. But suddenly it turns out that this can also happen on the server.

Moreover, if you rent servers from some hoster, then “good” hosters do not care that you have better performance. Their task is to make sure that their iron is utilized as efficiently as possible. Therefore, by default, they can turn on the laptop power saving mode on the operating system.

If you're using this on a heavily loaded database server, then your choice is acpi_cpufreq + permormance. Even with ondemand, there will already be problems.

Intel_pstate is a slightly different driver. And now preference is given to this one, as to a later and better working one.

And, accordingly, the governor is only performance. Ondemand, powersave and all the rest - this is not about you.

The results of explain analyze PostgreSQL can differ by several orders of magnitude if you enable powersave, because in practice you will have CPU shedding under the database in a completely unpredictable way.

These things can be enabled by default. Look carefully to see if they have been enabled by default. This can be a really big problem.

Linux tuning to improve PostgreSQL performance. Ilya Kosmodemyansky

And in the end, I wanted to say thanks to the guys from our PosgreSQL-Consulting DBA team, namely Max Boguk and Alexey Lesovsky, who every day fill bumps in this business. And for our clients, we are trying to do the best, so that it all works for them. It's like with aviation security instructions. Everything here is written in blood. Each of these nuts is discovered in the process of some kind of problem. I am happy to share them with you.

Questions:

Thank you! If, for example, a company wants to save money and host the database and application logic on the same server, or if the company follows the fashion trend of microservice architectures in which PostgreSQL runs in a container. What's the point? Sysctl globally affects the entire kernel. I have not heard that sysctls are somehow virtualized so that they work separately on the container. There is only cgroup and only part of it has control. How can you live with this? Or if you want performance, then run PostgreSQL on a separate iron server and tune it?

We have answered your question in about three ways. If we are not talking about an iron server that can be tuned, etc., then relax, everything will work fine without these settings. If you have such a load that you need to do these settings, then you will come to the iron server earlier than these settings.

What is the problem? If this is a virtual machine, then most likely you will have many problems, for example, with the fact that most virtual machines have rather inconsistent disk latency. Even if disk throughput is good, a single failed I/O transaction that does not greatly affect the average throughput that happened at the time of checkpoint or at the time of writing to WAL, then the database will suffer greatly from this. And you will notice this before you run into these problems.

If you have NGINX on the same server, you will also have the same problem. He will fight for shared memory. And you will not reach the problems that are described here.

But on the other hand, some of these parameters will still be relevant to you. For example, with sysctl, set dirty_ratio so that it is not so crazy - in any case, this will help. One way or another, you will have interaction with the disk. And it will be wrong. This is generally the default of the parameters that I showed. And in any case, it is better to change them.

And with NUMA there can be problems. VmWare, for example, works well with NUMA with exactly the opposite settings. And here you have to choose - an iron server or a non-iron one.

I have a question related to Amazon AWS. They have images preconfigured. One of them is called Amazon RDS. Are there any custom settings for their operating system?

There are settings, but they are different settings. Here we configure the operating system in terms of how the database will use this business. And there are parameters that determine where we should go now, such a shaping. That is, we need so many resources, we will eat them up now. After that, Amazon RDS fastens these resources, and performance drops there. There are separate stories of how people begin to chemistry with this matter. Sometimes even quite successfully. But it has nothing to do with OS settings. It's like cloud hacking. It's a different story.

Why does Transparent huge pages have no effect compared to Huge TLB?

Do not give. This can be explained in many ways. But in fact they just don't give it. What is the history of PostgreSQL? At startup, it allocates a large chunk of shared memory. Transparent they are at the same time or not transparent - it does not matter at all. The fact that they stand out at the start explains everything. And if there is a lot of memory and you need to rebuild the shared_memory segment, then Transparent huge pages will be relevant. In PostgreSQL, it is simply highlighted at the start with a huge piece and that's it, and then nothing special happens there. You can, of course, use it, but there is a chance to get curruption shared_memory when it re-allocates something. PostgreSQL does not know about this.

Source: habr.com

Add a comment