Linux kernel tuning

First, the number of open files to optimize the number of open systems and processes

In order to prevent runaway process undermine the performance of the system, Unix and Linux keeps track of most of the resources used by the process, users and system administrators will be resource constraints, such as a user controls the maximum number of open files, open a number of processes for user restrictions, are generally restrictive means include: soft and hard limits.
Note: The soft limit and there is not much use, both in general this value is set equal.

1, the maximum number of open files temporarily setting method:

  • This parameter is invalid restart the server, a command line terminal executes the following commands:
ulimit   -n  65535
  • Permanently set method

The following code to limit the kernel file /etc/security/limits.confend:

* 	soft 	noproc			65535
* 	hard 	noproc			65535
* 	soft 	nofile			65535
* 	hard 	nofile			65535

If you need to limit the maximum number of the entire Linux file system is provided, to modify /proc/sys/fs/file-maxthe value, which Linux总文件打开数, for example, to:echo 3865161233 >/proc/sys/fs/file-max。
"*"表示所有用户。如果改成root就表示对root用户的限制。

2, file descriptor

我们打开一个服务,会起很多进程,进程又会打开很多的文件,比如服务配置文件、日志文件等等,<font color=red>文件描述符便是内核为了高效管理已被打开的文件所创建的索引。</font>

We now look at all the file descriptors nginx main processes:
Here Insert Picture DescriptionHere Insert Picture DescriptionNote: The file descriptor small non-negative integer!


The expansion of knowledge (non-priority):
Linux system maintenance for each process a file descriptor table, values in this table are zero-based, in a different process you will see the same file descriptors, the same file descriptor it is possible to point to the same file, there may point to a different file. Linux kernel for file operations, maintenance of the three conceptual data structure is as follows:

  • Process-level file descriptor table;
  • System level open file descriptor tables;
  • i-node table file system;

The difference between the three your own google search!

Two, TCP three-way handshake and four off

Here Insert Picture DescriptionUse tcpdump packet capture analysis:
make sure your Nginx has been successfully deployed!
Here Insert Picture Description

tcpdump -i ens33 -nn host 172.16.193.201 and port 80 #抓取所有经过ens33,目的或源地址:端口是172.16.193.201:80的网络数据
-i:指定本地监控的网络接口
-n:对地址以数字方式显式,否则显式为主机名;
-nn:除了-n的作用外,还把端口显示为数值,否则显示端口服务名;

Access it Nginx.
Three-way handshake:
Here Insert Picture Descriptionfour off:
Here Insert Picture Description

Third, the kernel parameter optimization

Stored under Linux / proc / sys directory the most kernel parameters, and can be changed while the system is running, restart the machine will generally fail; kernel configuration file with the file / proc / sys /etc/sysctl.confvariables exist correspondence, that is modified sysct.conf profile, in fact, is to modify / proc / sys parameters!

1, BAT production environment complete kernel parameters:

net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
net.ipv4.tcp_max_tw_buckets = 10000
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 16384 4194304
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.netdev_max_backlog = 262144
net.core.somaxconn = 262144
net.ipv4.tcp_max_orphans = 3276800
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_synack_retries = 1
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_fin_timeout = 1
net.ipv4.tcp_keepalive_time = 30
net.ipv4.ip_local_port_range = 1024 65535

2, Linux kernel common arguments detailed

net.ipv4.tcp_timestamps = 1 //该参数控制RFC 1323 时间戳与窗口缩放选项;
net.ipv4.tcp_sack = 1 //选择性应答(SACK)是 TCP 的一项可选特性,可以提高某些网络中所有可用带宽的使用效率;
net.ipv4.tcp_fack = 1 //打开FACK(Forward ACK) 拥塞避免和快速重传功能;
net.ipv4.tcp_retrans_collapse = 1  //打开重传重组包功能,为0的时候关闭重传重组包功能;
net.ipv4.tcp_syn_retries = 5 //对于一个新建连接,内核要发送多少个SYN 连接请求才决定放弃;
net.ipv4.tcp_synack_retries = 5
tcp_synack_retries //显示或设定Linux在回应SYN要求时尝试多少次重新发送初始SYN,ACK封包后才决定放弃;
net.ipv4.tcp_max_orphans = 131072 //系统所能处理不属于任何进程的TCP sockets最大数量;
net.ipv4.tcp_max_tw_buckets = 5000//系统同时保持TIME_WAIT套接字的最大数量,如果超过这个数字,TIME_WAIT套接字将立刻被清除并打印警告信息;默认为180000,设为较小数值此项参数可以控制TIME_WAIT套接字的最大数量,避免服务器被大量的TIME_WAIT套接字拖死;

net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 3
//如果某个TCP连接在空闲30秒后,内核才发起probe(探查);
//如果probe 3次(每次3秒既tcp_keepalive_intvl值)不成功,内核才彻底放弃,认为该连接已失效;

net.ipv4.tcp_retries1 = 3 //放弃回应一个TCP 连接请求前﹐需要进行多少次重试;
net.ipv4.tcp_retries2 = 15 //在丢弃激活(已建立通讯状况)的TCP连接之前﹐需要进行多少次重试;
net.ipv4.tcp_fin_timeout = 30 //表示如果套接字由本端要求关闭,这个参数决定了它保持在 FIN-WAIT-2状态的时间;
net.ipv4.tcp_tw_recycle = 1 //表示开启TCP连接中TIME-WAIT sockets的快速回收,默认为0,表示关闭;
net.ipv4.tcp_max_syn_backlog = 8192 //表示SYN队列的长度,默认为1024,加大队列长度为8192,可以容纳更多等待连接的网络连接数;
net.ipv4.tcp_syncookies = 1 //TCP建立连接的 3 次握手过程中,当服务端收到最初的 SYN 请求时,会检查应用程序的syn_backlog队列是否已满,启用syncookie,可以解决超高并发时的Can’t  Connect` 问题。但是会导致 TIME_WAIT 状态fallback为保持2MSL时间,高峰期时会导致客户端无可复用连接而无法连接服务器;
net.ipv4.tcp_orphan_retries = 0 //关闭TCP连接之前重试多少次;
net.ipv4.tcp_mem = 178368  237824     356736
net.ipv4.tcp_mem[0]:  //低于此值,TCP没有内存压力;
net.ipv4.tcp_mem[1]:  //在此值下,进入内存压力阶段; 
net.ipv4.tcp_mem[2]:  //高于此值,TCP拒绝分配socket;
net.ipv4.tcp_tw_reuse = 1 //表示开启重用,允许将TIME-WAIT sockets重新用于新的TCP连接;
net.ipv4.ip_local_port_range = 1024 65000 //表示用于向外连接的端口范围;
net.ipv4.ip_conntrack_max = 655360 //在内核内存中netfilter可以同时处理的“任务”(连接跟踪条目);
net.ipv4.icmp_ignore_bogus_error_responses = 1 //开启恶意icmp错误消息保护;
net.ipv4.tcp_syncookies = 1 //开启SYN洪水攻击保护。

3, Linux kernel common arguments detailed

net.ipv4.tcp_sack = 1 //选择性应答(SACK)是 TCP 的一项可选特性,可以提高某些网络中所有可用带宽的使用效率;
net.ipv4.tcp_fack = 1 //打开FACK(Forward ACK) 拥塞避免和快速重传功能;
net.ipv4.tcp_syn_retries = 5 //对于一个新建连接,内核要发送多少个SYN 连接请求才决定放弃;
net.ipv4.tcp_synack_retries = 5 //tcp_synack_retries显示或设定Linux在回应SYN要求时尝试多少次重新发送初始SYN,ACK封包后才决定放弃;
net.ipv4.tcp_max_tw_buckets = 5000//系统同时保持TIME_WAIT套接字的最大数量,如果超过这个数字,TIME_WAIT套接字将立刻被清除并打印警告信息;默认为180000,设为较小数值此项参数可以控制TIME_WAIT套接字的最大数量,避免服务器被大量的TIME_WAIT套接字拖死;

net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 3 
//如果某个TCP连接在空闲30秒后,内核才发起probe(探查);
//如果probe 3次(每次3秒既tcp_keepalive_intvl值)不成功,内核才彻底放弃,认为该连接已失效;

net.ipv4.tcp_fin_timeout = 30 //表示如果套接字由本端要求关闭,这个参数决定了它保持在 FIN-WAIT-2状态的时间;
net.ipv4.tcp_tw_recycle = 1 //表示开启TCP连接中TIME-WAIT sockets的快速回收,默认为0,表示关闭;
net.ipv4.tcp_syncookies = 1//TCP建立连接的 3 次握手过程中,当服务端收到最初的 SYN 请求时,会检查应用程序的syn_backlog队列是否已满,启用syncookie,可以解决超高并发时的Can’t  Connect` 问题。但是会导致 TIME_WAIT 状态fallback为保持2MSL时间,高峰期时会导致客户端无可复用连接而无法连接服务器;
net.ipv4.tcp_orphan_retries = 0 //关闭TCP连接之前重试多少次;
net.ipv4.tcp_tw_reuse = 1 //表示开启重用,允许将TIME-WAIT sockets重新用于新的TCP连接;
net.ipv4.ip_local_port_range = 1024 65000 //表示用于向外连接的端口范围;
net.ipv4.ip_conntrack_max = 655360 //在内核内存中netfilter可以同时处理的“任务”(连接跟踪条目);
net.ipv4.tcp_syncookies = 1 //开启SYN洪水攻击保护。

Four, Linux kernel error analysis

1、time wait bucket table overflow错误

Sep 23 04:45:55 localhost kernel: TCP: time wait bucket table overflow
Sep 23 04:45:55 localhost kernel: TCP: time wait bucket table overflow
Sep 23 04:45:55 localhost kernel: TCP: time wait bucket table overflow
Sep 23 04:45:55 localhost kernel: TCP: time wait bucket table overflow
Sep 23 04:45:55 localhost kernel: TCP: time wait bucket table overflow
Sep 23 04:45:55 localhost kernel: TCP: time wait bucket table overflow

According to the TCP protocol defined in the 3-way handshake and disconnected four provisions, the party launched the initiative to close socket Socket will enter TIME_WAIT state, TIME_WAIT state will last two MSL (Max Segment Lifetime).
net.ipv4.tcp_max_tw_buckets值设置过小导致When the system Time setting default values exceed the number of wait, that will throw the above warning message, the need to increase the value of net.ipv4.tcp_max_tw_buckets, the warning message eliminate.
当然也不能设置过大Process, for a server handling a large number of short connection, if the server closes the connection client will result in the presence server Socket large amount of the TIME_WAIT state, even much more Socket in Established state ratio is seriously affected server ability, or even run out of available Socket stop service, TIME_WAIT TCP protocol is intended to ensure the re-allocation of Socket residual mechanism will not be a delay before sending packets impact, it is the necessary logic to ensure that TCP transport.

2、Too many open files错误

Benchmarking localhost (be patient)
socket: Too many open files (24)
socket: Too many open files (24)
socket: Too many open files (24)
socket: Too many open files (24)
socket: Too many open files (24)

The following code to the kernel limit file /etc/security/limits.conf, restart the server to take effect!

* 	soft 	noproc			65535
* 	hard 	noproc			65535
* 	soft 	nofile			65535
* 	hard 	nofile			65535

3, DDOS attack protection: possible SYN flooding on port 80. Sending cookies error.

May 31 14:20:14 localhost kernel: possible SYN flooding on port 80. Sending cookies.
May 31 14:21:28 localhost kernel: possible SYN flooding on port 80. Sending cookies.
May 31 14:22:44 localhost kernel: possible SYN flooding on port 80. Sending cookies.
May 31 14:25:33 localhost kernel: possible SYN flooding on port 80. Sending cookies.
May 31 14:27:06 localhost kernel: possible SYN flooding on port 80. Sending cookies.
May 31 14:28:44 localhost kernel: possible SYN flooding on port 80. Sending cookies.
May 31 14:28:51 localhost kernel: possible SYN flooding on port 80. Sending cookies.
May 31 14:31:01 localhost kernel: possible SYN flooding on port 80. Sending cookies.

This problem is due to the SYN queue is full, triggered SYN cookies, generally due to a large number of access, or cause malicious access, also known asSYN Flooding洪水攻击。它是DDOS攻击中的一种。

防护DDOS攻击有两种手段,一是基于硬件专业防火墙、二是基于Linux内核简单防护If traffic is particularly heavy attack, simply configure the kernel parameters are unable to resist, had to rely on professional-grade hardware firewall,

The following parameters are optimized Linux kernel protection DDOS, add the following code:

net.ipv4.tcp_fin_timeout = 30 
net.ipv4.tcp_keepalive_time = 1200 
net.ipv4.tcp_syncookies = 1 
net.ipv4.tcp_tw_reuse = 1 
net.ipv4.tcp_tw_recycle = 1 
net.ipv4.ip_local_port_range = 1024 65000 
net.ipv4.tcp_max_syn_backlog = 8192 
net.ipv4.tcp_max_tw_buckets = 8000 
net.ipv4.tcp_synack_retries = 2 
net.ipv4.tcp_syn_retries = 2
He published 188 original articles · won praise 150 · views 30000 +

Guess you like

Origin blog.csdn.net/weixin_44571270/article/details/104879180