Database PostrageSQL- manage kernel resources

18.4. Managing Kernel Resources

PostgreSQL sometimes exhausts the various resource limits of the operating system, especially when multiple copies of the server are running on the same system or in a very large installation. This section explains the kernel resources used by PostgreSQL and the steps you can take to solve problems related to kernel resource consumption.

18.4.1. Shared memory and semaphores

PostgreSQL requires the operating system to provide inter-process communication (IPC) features, especially shared memory and semaphores. Unix-driven systems usually provide "System V" IPC, "POSIX" IPC, or both. Windows has its own implementation of these functions, which is not discussed here.

The complete lack of these functions usually manifests as an "Illegal system call" error when the server starts. In this case, there is no choice but to reconfigure the kernel. PostgreSQL cannot work without them. However, this situation is rare in modern operating systems.

When starting the server, PostgreSQL usually allocates a small amount of System V shared memory and a large amount of POSIX (mmap) shared memory. In addition, a large number of semaphores are created when the server starts, and these semaphores can be System V or POSIX style. Currently, POSIX semaphores are used in Linux and FreeBSD systems, while other platforms use System V semaphores.

Before PostgreSQL 9.3, only System V shared memory was used, so the amount of System V shared memory required to start the server was larger. If you are running an older version of the server, please refer to the documentation for that server version.

System V IPC features are usually limited by system-wide allocation restrictions. When PostgreSQL exceeds one of these limits, the server will refuse to start and leave an instructive error message describing the problem and what to do (see also Section 18.3.1). Relevant kernel parameters are named in the same way among different systems. Table 18.1 gives an overview. However, there are many ways to set them. Recommendations for certain platforms are given below: Table 18.1. System V IPC parameters
Insert picture description here
Insert picture description here
PostgreSQL requires a few bytes of System V shared memory (usually 48 bytes on 64-bit platforms) for each server copy. On most modern operating systems, this amount is easily available. However, if you run many server copies, or other applications are also using System V shared memory, you may need to increase SHMALL (the total amount of System V shared memory system-wide). Note that on many systems, SHMALL is measured in pages rather than bytes.

What is unlikely to be a problem is the minimum size of the shared memory segment (SHMMIN), which for PostgreSQL should be at most about 32 bytes (usually just 1). The system-wide (SHMMNI) or maximum number of shared memory segments per process (SHMSEG) is unlikely to cause problems unless your system sets them to zero.

When using System V semaphores, PostgreSQL uses one semaphore for each allowed connection (max_connections), each allowed autovacuum worker process (autovacuum_max_workers), and each allowed background process (max_worker_processes), with 16 as the A collection. Each such set also contains the 17th semaphore, which stores a "magic number" to detect conflicts with semaphore sets used by other applications. The maximum number of semaphores in the system is set by SEMMNS, so this value must be at least as large as max_connections plus autovacuum_max_workers plus max_worker_processes, and processes every 16 connections

The author must add another one (see the formula in Table 18.1). The parameter SEMMNI determines the limit on the number of semaphore sets that can exist at the same time in the system. Therefore, this parameter must be at least ceil((max_connections + autovac uum_max_workers + max_worker_processes + 5) / 16).

Reducing the number of allowed connections is a temporary way to circumvent failures (from the function semget), usually using the confusing term "No space left on device".

In some cases, it may be necessary to increase SEMMAP so that it is at least similar to SEMMNS. If the system has this parameter (many systems do not), this parameter defines the size of the semaphore resource mapping, in which one item is required for each continuous available semaphore block. Whenever a semaphore set is released, it will either be added to an existing item adjacent to the released block, or it will be registered in a new mapping item. If the map is filled, the released semaphore will be lost (until restart). Therefore, the fragmentation of the semaphore space for a long time will cause the available semaphore to be less than the expected semaphore.

Various other settings related to "semaphore undo", such as SEMMNU and SEMUME, will not affect PostgreSQL. When using POSIX semaphores, the number of semaphores required is the same as System V, that is, one semaphore per allowed connection (max_connections), allowed autovacuum_max_workers and allowed background processes (max_worker_processes). On platforms where this option is preferred, there is no specific kernel limit on the number of POSIX semaphores.

AIX
at least until version 5.1, no special configuration for these parameters (such as SHMMAX) is no longer required, which looks like it is configured to allow all memory to be used as shared memory. This is a configuration commonly used for other databases (DB/2).

However, you may need to modify the global ulimit information in /etc/security/limits. The default file size hard limit (fsize) and the number of files (nofiles) may be too low.

FreeBSD
can use the sysctl or loader interface to change the default IPC configuration. The following parameters can be set using sysctl:

# sysctl kern.ipc.shmall=32768
# sysctl kern.ipc.shmmax=134217728
# sysctl kern.ipc.semmap=256

To keep these settings after restart, please modify /etc/sysctl.conf.

For what sysctl cares about, these semaphore-related settings are read-only, but can be /boot/loader.confset in:

kern.ipc.semmni=256
kern.ipc.semmns=512

After modifying the configuration file, a restart is required to make the new settings take effect.

You may also want your kernel to lock shared memory in RAM and prevent it from being paged to the swap partition. This can be done using the sysctl setting kern.ipc.shm_use_phys.

If sysctl's security.jail.sysvipc_allowed is enabled to run in FreeBSD jail, the postmaster running in different jail should be run by different operating system users. This can improve security because it prevents non-root users from interfering with shared memory or semaphores in different jail, and it allows the PostgreSQL IPC cleanup code to work correctly (in FreeBSD 6.0 and later, the IPC cleanup code cannot If processes in other jail are detected, postmasters in different jail cannot be prevented from running on the same port).

Versions prior to FreeBSD 4.0 work similarly to the old version of OpenBSD (see below).

NetBSD
in NetBSD 5.0 ​​and later versions, IPC parameters can be adjusted using sysctl. E.g:

$ sysctl -w kern.ipc.semmni=100

To keep these settings after restart, please modify /etc/sysctl.conf.

As the default setting of NetBSD, you always want to increase the values ​​of kern.ipc.semmni and kern.ipc.semmns because they are too small.

You may also want your kernel to lock shared memory in RAM and prevent it from being paged to the swap partition. This can be done using the sysctl setting kern.ipc.shm_use_phys.

NetBSD 5.0 ​​and earlier versions work similarly to the old version of OpenBSD (see below), except that those kernel parameters should be set with the keyword options instead of option.

OpenBSD
in OpenBSD3.3 and later versions, use sysctl command, IPC parameters can be automatically adjusted, for example:

# sysctl kern.seminfo.semmni=100

To keep these settings after restart, please modify /etc/sysctl.conf.

As the default configuration of OpenBSD, you will always want to increase kern.seminfo.semmni和kern.seminfo.semmnsthe value because they are too small.

In earlier versions of OpenBSD, you need to compile a customized kernel to modify these IPC parameters. Also make sure that the SYSVSHM and SYSVSEM options are enabled. (These two items are enabled by default.) Here are some examples of how to set these parameters in the kernel configuration file:

option SYSVSHM
option SHMMAXPGS=4096
option SHMSEG=256
option SYSVSEM
option SEMMNI=256
option SEMMNS=512
option SEMMNU=256
HP-UX 

The default setting can satisfy normal installation. On HP-UX 10, the factory default value of SEMMNS is 128, which may be too low for large database sites.

IPC parameters can Kernel Configuration → Configurable Parameters下的System Admin istration Manager(SAM)be set in. Choose when you are done Create A New Kernel.

The
default maximum segment size of Linux is 32 MB, and the default maximum total size is 2097152 pages. A page is almost always 4096 bytes, except in kernel configurations that use rare "huge pages" (use getconfPAGE_SIZE to verify).

The shared memory size setting can be changed through the sysctl interface. For example, to allow 16 GB:

$ sysctl -w kernel.shmmax=17179869184
$ sysctl -w kernel.shmall=4194304

In addition, these settings can be saved in the file /etc/sysctl.conf between restarts. We strongly recommend this.

The ancient hairstyle may not have a sysctl program, but the equivalent changes can be obtained by manipulating the /proc file system:

$ echo 17179869184 >/proc/sys/kernel/shmmax
$ echo 4194304 >/proc/sys/kernel/shmall

The remaining default values ​​are set very generously and usually do not need to be changed.

macOS
configuration of shared memory in macOS recommended method is to create a file named /etc/sysctl.conf, containing variable assignments such as:

kern.sysv.shmmax=4194304
kern.sysv.shmmin=1
kern.sysv.shmmni=32
kern.sysv.shmseg=8
kern.sysv.shmall=1024

Note that in some versions of macOS, all five shared memory parameters must be set in /etc/sysctl.conf, otherwise the value will be ignored.

Note that recent versions of macOS will ignore attempts to set SHMMAX to a value other than 4096.

On this platform, SHMALL measures in 4kB pages.
In older versions of macOS, you will need to reboot for the shared memory parameter changes to take effect. By 10.5, you can use sysctl to change all parameters except SHMMNI at any time. But it's better /etc/sysctl.confto set the values ​​you like through, so that these values ​​can be maintained after restart.

The /etc/sysctl.conf file is only followed in macOS 10.3.9 and later versions. If you are using a release prior to 10.3.x, you must edit the file /etc/rc and change the value in the following command:

sysctl -w kern.sysv.shmmax
sysctl -w kern.sysv.shmmin
sysctl -w kern.sysv.shmmni
sysctl -w kern.sysv.shmseg
sysctl -w kern.sysv.shmall

Note that /etc/rc is usually overwritten by macOS system updates, so you should redo these edits after each update.

In macOS 10.2 and earlier, /System/Library/StartupItems/SystemTuning/SystemTuningthese commands should be edited in the file .

Solaris 2.6 to 2.9 (Solaris 6 to Solaris 9) Similar settings can be changed in /etc/system, for example:

set shmsys:shminfo_shmmax=0x2000000
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=256
set shmsys:shminfo_shmseg=256
set semsys:seminfo_semmap=256
set semsys:seminfo_semmni=512
set semsys:seminfo_semmns=512
set semsys:seminfo_semmsl=32

You need to restart for the changes to take effect. For information about shared memory under older versions of Solaris, see

http://sunsite.uakom.sk/sunworldonline/swol-09-1997/swol-09-insidesolaris.html

Solaris 2.10 (Solaris 10) and later OpenSolaris In Solaris 10 and later versions and OpenSolaris, the default shared memory and semaphore settings are sufficient for most PostgreSQL applications. Solaris now sets the default value of SHMMAX to
a quarter of the system RAM. To further adjust this setting, use a project setting related to the postgres user. For example, run the following command as root:

projadd -c "PostgreSQL DB User" -K "project.max-shmmemory=(privileged,8GB,deny)" -U postgres -G postgres user.postgres

This command adds the user.postgres project and sets the maximum shared memory used for postgres users to 8GB, and it will take effect the next time the user logs in or restarts PostgreSQL (not reloaded). The above assumes that PostgreSQL is run by the postgres user in the postgres group. No need to restart the server.

For database servers that will have a huge number of connections, the other kernel setting modifications we recommend are:

project.max-shm-ids=(priv,32768,deny)
project.max-sem-ids=(priv,4096,deny)
project.max-msg-ids=(priv,4096,deny)

In addition, if you are running PostgreSQL in a zone, you may also need to increase the resource usage limit in that zone. For more information, see the prctl and projects System Administrator's Guidein "Chapter2: Projects and Tasks".

18.4.2. systemd RemoveIPC

If you are using systemd, you must take care that IPC resources (shared memory and semaphores) will not be deleted prematurely by the operating system. This is of particular concern when installing PostgreSQL from source code. Users of PostgreSQL distribution packages are unlikely to be affected, because postgres users are usually created as system users.

Controls whether to remove the IPC object when the user completely logs out. Exempt for system users. This setting is on by default in rigid systemd, but some operating system distributions are off by default.

When this setting is turned on, the typical observation effect is that the semaphore object used by the PostgreSQL server is deleted at apparently random times, causing the server to crash and display log messages

LOG: semctl(1234567890, 0, IPC_RMID, ...) failed: Invalid argument

Different types of IPC objects (shared memory and semaphores, System V and POSIX) are slightly different in systemd, so you may find that some IPC resources will not be deleted like other IPC resources. But relying on these subtle differences is not advisable.

"Logging out user" may occur as part of maintenance work, or manually when an administrator logs in as a postgres user or similar, so it is usually difficult to prevent.

What is a "system user" is determined by the SYS_UID_MAX setting in /etc/login.defs during systemd compilation. Packaging and deployment scripts should be careful to create a postgres user as a system user by using useradd -r, adduser --system or equivalent.
Or, if the user account is created incorrectly or cannot be changed, it is recommended to setRemoveIPC=no

In /etc/systemd/logind.confor other appropriate configuration files.

Ensure at least one of these two things, otherwise the PostgreSQL server will be very unreliable.

18.4.3. Resource Limits

Unix-like operating systems impose many resource limits, which may interfere with the operation of your PostgreSQL server. Especially important is the limit on the number of processes per user, the limit on the number of open files per process, and the limit on the memory available for each process. Each of these limits has a "hard" limit and a "soft" limit. The actual use is the soft limit, but the user can modify it to the maximum hard limit. The hard limit can only be modified by the root user. The system call setrlimit is responsible for setting these parameters. shell built ulimit(Bourne shells)or limit(csh)to be controlled from the command line to resource limitations. On BSD-derived systems, the /etc/login.conffile controls various resource limits set during login. See operating system documentation for details. The relevant parameters are maxproc, openfiles and datasize. E.g:

default:\
...
 :datasize-cur=256M:\
:maxproc-cur=256:\
 :openfiles-cur=256:\
...

(-Cur is the soft limit. Increase -max to set the hard limit).
The kernel can also have system-wide restrictions on certain resources.

  • On Linux, /proc/sys/fs/file-maxdetermine the maximum number of open files that the kernel can support. You can modify this value by writing a different value to the file, or by /etc/sysctl.confadding an assignment to it. The maximum number of open files per process is fixed when compiling the kernel; see more information /usr/src/linux/Documentation/proc.txt.

The PostgreSQL server uses one process for each connection, so you should have at least as many processes as allowed connections, plus the number of processes required by the rest of the system. Usually this is not a problem, but if you run multiple servers on one machine, resource usage may be tight.

The factory default limit for opening files is usually set to a "socially friendly" value, which allows many users to coexist on a machine without causing disproportionate use of system resources. If you run many servers on one machine, this may be what you want, but on dedicated servers, you may need to increase this limit.

On the other hand, some systems allow independent processes to open a lot of files; if more than a few processes do so, the system-wide limit can easily be exceeded. If you find that this is like, and do not want to modify the system-wide limit, you can set PostgreSQLthe max_files_per_processconfiguration parameters to limit the consumption of the number of open files.

18.4.4. Linux memory overcommit

In Linux 2.4 and later, the default virtual memory behavior is not optimal for PostgreSQL. Due to the method of memory overcommitment implemented by the kernel, if the memory requirements of PostgreSQL or other processes cause the system to run out of virtual memory, the kernel may terminate the postmaster process of PostgreSQL (the main server process).

If this happens, you will see a kernel message like the following (refer to your system documentation and configuration to see where you can see such a message):

Out of Memory: Killed process 12345 (postgres).

This indicates that the postgres process was terminated due to memory pressure. Although existing database connections will continue to function normally, new connections will not be accepted. To recover, PostgreSQL should be restarted.

One way to avoid this problem is to run PostgreSQL on a machine where you are sure that other processes will not run out of memory. If memory resources are tight, increasing the swap space of the operating system can help avoid this problem, because the out of memory (OOM) killer (that is, the behavior of terminating the process) will only be called when the physical memory and swap space are exhausted.

If PostgreSQL itself is the cause of system memory exhaustion, you can avoid this problem by changing your configuration. In some cases, reduce memory-related configuration parameters may be helpful, in particular shared_buffers, and work_memthe two parameters. In other cases, allowing too many connections to the database server itself may also cause the problem. In many cases, it is best to reduce max_connections and use external connection pool software instead.

In Linux 2.6 and later, the behavior of the kernel can be modified so that it will not "overcommit" memory. Although this setting will not prevent OOM Killer 1 from being called, it can significantly reduce its possibility and will result in more robust system behavior. This can be achieved by using sysctl to select a strict overcommit mode:

sysctl -w vm.overcommit_memory=2

Or /etc/sysctl.confplace an equivalent item in. You may also want to modify related settings vm.overcommit_ratio. For more information, please refer to the https://www.kernel.org/doc/Documentation/vm/overcommit-accountingfile of the kernel documentation .

Another method can be used with or without change vm.overcommit_memory. It OOM score adjustmentsets the process-related value of the postmaster process to -1000 to ensure that it will not become the target of OOM killers. The easiest way to do this is to execute it in the startup script of the postmaster

echo -1000 > /proc/self/oom_score_adj

And it must be executed before calling postmaster. Please note that this action must be done as root, otherwise it will have no effect. So a startup script owned by root is the easiest place to put this action. If you do this, you should also set these environment variables in the startup script before calling postmaster:

export PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj
export PG_OOM_ADJUST_VALUE=0

These settings will cause the postmaster child processes using ordinary value of zero OOM score adjustmentruns, so OOM killer still at the time they need to be targeted. If you want the child process with some other OOM score adjustmentoperating values, you can PG_OOM_ADJUST_VALUEuse other values ( PG_OOM_ADJUST_VALUEcan also be omitted, then it defaults to zero). If you do not set PG_OOM_ADJUST_FILE, the child process will use the same postmaster OOM score adjustmentis running, it is unwise, because the focus is to ensure postmaster has priority setting.

Older Linux kernels do not provide it /proc/self/oom_score_adj, but there may be an earlier version with the same functionality, which is called /proc/self/oom_adj. This method works exactly the same, except that the disabled value is -17 instead of -1000.

Some vendors’ Linux 2.4 kernels have been reported to have earlier versions of 2.6 overusing sysctl parameters. However, setting vm.overcommit_memory to 2 in the 2.4 kernel without the relevant code will make things worse. We recommend that you check the actual kernel source code (see the vm_enough_memory function in the file mm/mmap.c), verify that this is supported in your kernel
, and then use it in the 2.4 installation. The existence of the document file overcommit-accounting cannot be regarded as proof of the existence of this feature. If in doubt, please consult a kernel expert or your kernel manufacturer.

18.4.5. Linux huge pages

When PostgreSQL uses a large number of contiguous memory blocks, using large pages will reduce overhead, especially when using large shared_buffers. To use this feature in PostgreSQL, you need to contain a CONFIG_HUGETLBFS=yCONFIG_HUGETLB_PAGE=ykernel. You must also adjust the kernel settings vm.nr_hugepages. To estimate the number of huge pages required, please start PostgreSQL without enabling huge pages, and use the /proc file system to check the size of the anonymous shared memory segment of the postmaster and the huge page size of the system. This might look like:

$ head -1 $PGDATA/postmaster.pid
4170
$ pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'
6490428K
$ grep ^Hugepagesize /proc/meminfo
Hugepagesize: 2048 kB

6490428/2048 is approximately 3169.154, so in this example you need at least 3170 huge pages, we can set:

$ sysctl -w vm.nr_hugepages=3170

If other programs on the machine also require large pages, a larger setting will be appropriate. Don't forget to add this setting to /etc/sysctl.conf to reapply it after reboot.

Sometimes the kernel cannot allocate as many large pages as you want, so it may be necessary to repeat the command or restart. (After restarting, most of the machine's memory should be converted to huge pages immediately.) To verify the huge page allocation, use:

$ grep Huge /proc/meminfo

It may also be necessary to grant the operating system user permissions of the database server, so that he can vm.hugetlb_shm_groupuse sysctl settings to use large pages, and/or grant the permission to lock memory using ulimit -l.

The default behavior of huge pages in PostgreSQL is to use them as much as possible and switch back to normal pages when they fail. To force the use of huge pages, you can set huge_pages to on in postgresql.conf. Note that if there are not enough large pages available under this setting, PostgreSQL will fail to start.

A detailed description of the Linux huge page feature can be found at https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt.

Guess you like

Origin blog.csdn.net/weixin_42528266/article/details/108593549