Tuning Linux Kernel Options to Optimize PostgreSQL

Tuning Linux Kernel Options to Optimize PostgreSQL Optimal PostgreSQL performance depends on properly defined operating system settings. Poorly configured OS kernel parameters can lead to slow performance of the database server. Therefore, it is imperative that these settings be configured according to the database server and its workload. In this post, we will discuss some of the important Linux kernel parameters that can affect database server performance and how to tune them.

SHMMAX / SHMALL

SHMMAX is a kernel parameter used to determine the maximum size of a single shared memory segment that a Linux process can allocate. Prior to version 9.2, PostgreSQL used System V (SysV), which requires the SHMMAX setting. After 9.2 PostgreSQL switched to POSIX shared memory. So now fewer bytes of System V shared memory are required.

Prior to version 9.3, SHMMAX was the most important kernel parameter. The SHMMAX value is given in bytes.

Similarly, SHMALL is another kernel parameter used to determine
total system size of shared memory pages. To view the current SHMMAX, SHMALL, or SHMMIN values, use the command ipcs.

SHM* Details - Linux

$ ipcs -lm

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 1073741824
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1

SHM* Details - MacOS X

$ ipcs -M
IPC status from  as of Thu Aug 16 22:20:35 PKT 2018
shminfo:
	shmmax: 16777216	(max shared memory segment size)
	shmmin:       1	(min shared memory segment size)
	shmmni:      32	(max number of shared memory identifiers)
	shmseg:       8	(max shared memory segments per process)
	shmall:    1024	(max amount of shared memory in pages)

PostgreSQL uses System V IPC to allocate shared memory. This setting is one of the most important kernel options. Whenever you get the following error messages, it means that you have an older version of PostgreSQL and have a very low SHMMAX value. Users are expected to adjust and increase the value according to the shared memory they intend to use.

Possible Misconfiguration Errors

If SHMMAX is configured incorrectly, you may get an error when trying to initialize a PostgreSQL cluster with the command initdb.

initdb Failure
DETAIL: Failed system call was shmget(key=1, size=2072576, 03600).

HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter. 
You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 2072576 bytes),
reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.

If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter,
in which case raising the request size or reconfiguring SHMMIN is called for.

The PostgreSQL documentation contains more information about shared memory configuration. child process exited with exit code 1

Similarly, you may get an error when starting the PostgreSQL server using the command pg_ctl.

pg_ctl Failure
DETAIL: Failed system call was shmget(key=5432001, size=14385152, 03600).

HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.

You can either reduce the request size or reconfigure the kernel with larger SHMMAX.; To reduce the request size (currently 14385152 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.

If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter,
in which case raising the request size or reconfiguring SHMMIN is called for.

The PostgreSQL documentation contains more information about shared memory configuration.

Understanding Differences in Definitions

The definition of SHMMAX/SHMALL options is slightly different on Linux and MacOS X:

  • Linux: kernel.shmmax, kernel.shmall
  • MacOS X: kern.sysv.shmmax, kern.sysv.shmall

Team sysctl can be used to temporarily change the value. To set constant values, add an entry to /etc/sysctl.conf. Details are below.

Changing Kernel Options on MacOS X

# Get the value of SHMMAX
sudo sysctl kern.sysv.shmmax
kern.sysv.shmmax: 4096

# Get the value of SHMALL
sudo sysctl kern.sysv.shmall 
kern.sysv.shmall: 4096

# Set the value of SHMMAX
sudo sysctl -w kern.sysv.shmmax=16777216
kern.sysv.shmmax: 4096 -> 16777216

# Set the value of SHMALL 
sudo sysctl -w kern.sysv.shmall=16777216
kern.sysv.shmall: 4096 -> 16777216

Changing Kernel Options on Linux

# Get the value of SHMMAX
sudo sysctl kernel.shmmax
kernel.shmmax: 4096

# Get the value of SHMALL
sudo sysctl kernel.shmall
kernel.shmall: 4096

# Set the value of SHMMAX
sudo sysctl -w kernel.shmmax=16777216
kernel.shmmax: 4096 -> 16777216

# Set the value of SHMALL 
sudo sysctl -w kernel.shmall=16777216
kernel.shmall: 4096 -> 16777216

Do not forget: to make changes permanent add these values ​​to /etc/sysctl.conf

Huge Pages

Linux defaults to 4 KB pages, BSD defaults to Super Pages, and on Windows Large Pages. A page is a portion of RAM allocated to a process. A process can have multiple pages depending on memory requirements. The more memory a process needs, the more pages are allocated to it. The OS maintains a page allocation table for processes. The smaller the page size, the larger the table, the longer it takes to look up a page in that page table. Therefore, large pages allow a large amount of memory to be used with reduced overhead; fewer page views, fewer page faults, faster read/write operations through larger buffers. The result is improved performance.

PostgreSQL only supports large pages on Linux. By default, Linux uses 4 KB of memory pages, so in cases where there are too many memory operations, it is necessary to install larger pages. There is an increase in performance when using large pages of 2 MB and up to 1 GB. Large page size can be set at load time. You can easily check the large page options and their usage on your Linux machine using the command cat /proc/meminfo | grep -i huge.

Getting information about large pages (Linux only)

Note: This is only for Linux, for other OS this operation is ignored$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

In this example, although the large page size is set to 2048 (2 MB), the total number of large pages is 0. This means that large pages are disabled.

Script for determining the number of large pages

This simple script returns the required number of large pages. Run the script on your Linux server while PostgreSQL is running. Make sure the environment variable is set to $PGDATA the PostgreSQL data directory is specified.

Getting the Number of Large Pages Required

#!/bin/bash
pid=`head -1 $PGDATA/postmaster.pid`
echo "Pid:            $pid"
peak=`grep ^VmPeak /proc/$pid/status | awk '{ print $2 }'`
echo "VmPeak:            $peak kB"
hps=`grep ^Hugepagesize /proc/meminfo | awk '{ print $2 }'`
echo "Hugepagesize:   $hps kB"
hp=$((peak/hps))
echo Set Huge Pages:     $hp

The output of the script looks like this:

Script Output

Pid:            12737
VmPeak:         180932 kB
Hugepagesize:   2048 kB
Set Huge Pages: 88

The recommended value for large pages is 88, so you should set it to 88.

Installing large pages

sysctl -w vm.nr_hugepages=88

Check big pages now, you will see that no big pages are used (HugePages_Free = HugePages_Total).

Large page information again (Linux only)

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       88
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now set the huge_pages option to "on" in $PGDATA/postgresql.conf and restart the server.

Large Page Information Again (on Linux only)

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       81
HugePages_Rsvd:       64
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now you can see that very few large pages are being used. Let's now try to add some data to the database.

Some database operations to recycle large pages

postgres=# CREATE TABLE foo(a INTEGER);
CREATE TABLE
postgres=# INSERT INTO foo VALUES(generate_Series(1,10000000));
INSERT 0 10000000

Let's see if we're using more large pages now than we used to.

More information about large pages (Linux only)

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       18
HugePages_Rsvd:        1
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now you can see that most of the big pages are in use.

Note: The example value for HugePages used here is very low, which is not normal for a production machine. Please estimate the required number of pages for your system and set them accordingly depending on the load and resources.

vm.swappiness

vm.swappiness is another kernel setting that can affect database performance. This setting is used to control the swappiness behavior (paging pages in and out of memory) on Linux. The value ranges from 0 to 100. It determines how much memory will be paged or unloaded. Zero means no sharing, and 100 means aggressive sharing.

You can get good performance by setting lower values.

Setting it to 0 on newer kernels can cause the OOM Killer (Linux's memory cleanup process) to kill the process. So it's safe to set the value to 1 if you want to minimize paging. The default value on Linux is 60. A higher value causes the MMU (Memory Management Unit) to use more swap space than RAM, while a lower value keeps more data/code in memory.

A lower value is a good bet for performance improvements in PostgreSQL.

vm.overcommit_memory / vm.overcommit_ratio

Applications acquire memory and release it when it is no longer needed. But in some cases, the application gets too much memory and does not release it. This may trigger an OOM killer. Here are the possible parameter values vm.overcommit_memory with a description for each:

  1. Heuristic overcommit (default); kernel-based heuristic
  2. Allow overcommit anyway
  3. Don't overdo it, don't exceed the overcommit ratio.

URL: https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

vm.overcommit_ratio - percentage of RAM available for overloading. A value of 50% on a system with 2 GB of RAM can allocate up to 3 GB of RAM.

Setting vm.overcommit_memory to 2 provides the best performance for PostgreSQL. This value maximizes the RAM usage of the server process without any significant risk of being killed by the OOM killer process. The application will be able to reboot, but only within the overrun limit, which reduces the risk that the OOM killer will kill the process. Therefore, a value of 2 gives better performance than the default value of 0. However, reliability can be improved by ensuring that memory outside the allowed range is not overloaded. This eliminates the risk of the process being killed by the OOM-killer.

On non-paging systems, there may be a problem with vm.overcommit_memory being set to 2.

https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT

vm.dirty_background_ratio / vm.dirty_background_bytes

vm.dirty_background_ratio is the percentage of memory filled with dirty pages that need to be written to disk. Flushing to disk is done in the background. The value of this parameter ranges from 0 to 100; however, a value below 5 may not be effective and some kernels do not support it. 10 is the default value on most Linux systems. You can improve performance for write-intensive operations by a smaller factor, which will mean that Linux will flush dirty pages in the background.

You need to set the value vm.dirty_background_bytes depending on the speed of your disk.

There are no "good" values ​​for these two parameters, as both are hardware dependent. However, setting vm.dirty_background_ratio to 5 and vm.dirty_background_bytes to 25% of disk speed improves performance by up to ~25% in most cases.

vm.dirty_ratio/dirty_bytes

This is the same as vm.dirty_background_ratio/dirty_background_bytes, except that the reset is done in a work session, blocking the application. Therefore vm.dirty_ratio should be higher than vm.dirty_background_ratio. This ensures that background processes start earlier to avoid blocking the application as much as possible. You can adjust the difference between these two ratios based on disk I/O load.

Π‘onclusion

You can tweak other settings to improve performance, but the improvements will be minimal and you won't get much benefit. We must remember that not all options apply to all types of applications. Some apps work better when we tweak some settings and some don't. You must find the right balance between the configurations of these settings for the expected workload and application type, and the behavior of the OS must be taken into account when configuring. Tuning kernel parameters is not as easy as configuring database parameters: it is more difficult to make recommendations here.

Source: habr.com

Add a comment