Linux kernel Tcp performance tuning

Foreword:

The importance of the Tcp/ip protocol to network programming is known to those who have done network development. In addition to the hardware and structure limitations of the network programs we write, the performance can also be greatly improved by modifying the Tcp/ip kernel parameters.

Here are some Tcp/ip kernel parameters, explain their meaning and modify them to optimize our network programs, mainly for high concurrency.

Here the network program mainly refers to the server side

1. fs.file-max

The maximum number of file descriptors that can be opened, pay attention to the entire system.

In the server, we know that every time a connection is created, the system will open a file descriptor, so the maximum number of file descriptors opened also determines our maximum number of connections

The reason why select is replaced under high concurrency is also the maximum open file descriptor. Although it can be modified, this is generally not recommended. See the unp select section for details.

2.net.ipv4.tcp_max_syn_backlog

The maximum length of the Tcp syn queue. Tcp three-way handshake will occur when the system call connect is made. The server kernel will maintain two queues for Tcp, the Syn queue and the Accept queue. The Syn queue refers to storing the connection that completed the first handshake, Accept The queue is to store the connection that completes the entire Tcp three-way handshake. Modify net.ipv4.tcp_max_syn_backlog to increase it to accept more network connections.

Note that if this parameter is too large, you may encounter a Syn flood attack, that is, the other party sends multiple Syn packets to fill the Syn queue, making the server unable to continue to accept other connections

You can refer to this article http://tech.uc.cn/?p=1790

Let's take a look at what the man manual says:

The behavior of the backlog argument on TCP sockets changed with Linux 2.2. Now it specifies the queue length for com‐ pletely established sockets waiting to be accepted, instead of the number of incomplete connection requests. The maximum length of the queue for incomplete sockets can be set using /proc/sys/net/ipv4/tcp_max_syn_backlog. When syncookies are enabled there is no logical maximum length and this setting is ignored. See tcp(7) for more information. If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn, then it is silently truncated to that value; the default value in this file is 128. In kernels before 2.4.25, this limit was a hard coded value, SOMAXCONN, with the value 128.

Since the Linux kernel version 2.2, the backlog is the maximum value of the completed connection queue. The size of the uncompleted connection queue is determined by /proc/sys/net/ipv4/tcp_max_syn_backlog, but the connected queue size is limited by SOMAXCONN, which is min(backlog, SOMAXCONN )

3.net.ipv4.tcp_syncookies

Modifying this parameter can effectively prevent the syn flood attack mentioned above

Principle: When the Tcp server receives a Tcp Syn packet and returns a Tcp Syn+ack packet, it does not specifically allocate a data area, but calculates a cookie value based on the Syn packet. When receiving the Tcp ack packet, the Tcp server is checking the validity of the Tcp ack packet based on the cookie value. If it is legal, then allocate a special data area to handle future TCP connections.

The default is 0, 1 means open

4.net.ipv4.tcp_keepalive_time

The Tcp keepalive heartbeat packet mechanism is used to detect whether the connection has been disconnected. We can modify the default time to interrupt the frequency of heartbeat packet sending.

Keepalive generally means that the server sends to the client to check whether the client is online, because the server allocates certain resources to the client, but the keepalive mechanism of Tcp is very controversial because they can consume a certain amount of bandwidth.

For details of Tcp keepalive, see Tcp/ip Detailed Volume 1 Chapter 23

5.net.ipv4.tcp_tw_reuse

I wrote about the time_wait state in my last article. A large number of time_wait states are wasteful of resources. They occupy server descriptors and so on.

Modify this parameter to allow the socket in time_wait to be reused.

The default is 0, 1 means open

6.net.ipv4.tcp_tw_recycle

Also for the time_wait state, this parameter indicates that the socket in time_wait is quickly recovered.

The default is 0, 1 means open

7.net.ipv4.tcp_fin_timeout

Modify the existence time of time_wait state, the default 2MSL

Note: There is a reason that time_wait exists and the survival time is 2MSL. See why there is a time_wait state in my previous blog, so there is a certain risk to modify it. It is still analyzed according to the specific situation.

8.net.ipv4.tcp_max_tw_buckets

The maximum value of time_wait state allowed, if exceeded, it will be cleared and warned immediately.

9.net.ipv4.ip_local_port_range

Indicates the port range for external connections.

10.somaxconn

I mentioned the maximum length of the Syn queue. The somaxconn parameter determines the length of the Accept queue. When the listen function is called, the backlog parameter determines the length of the Accept queue. If this parameter is too small, it will also limit the maximum number of concurrent connections, because the 3 handshake is completed at the same time If the number of connections is too small, the server processing connection speed will be slower. Calling the accept function on the server side actually removes the connection that has completed the three-way handshake from the connected Accept queue.

The Accept queue and Syn queue are created and maintained by the listen function.

/proc/sys/net/core/somaxconn modification

Each of the above parameters is actually enough to write an article to analyze. Here I just summarize the next part of the parameters. Note that when modifying the Tcp parameters, we must decide based on our actual needs and test results.

 

Problem Description

Scenario: Java client and server use socket communication. The server uses NIO.

1. Intermittently, the three-way handshake between the client and the server has been completed, but the server's selector does not respond to the connection.

2. At the time of the problem, there will be many connections that have this problem at the same time.

3. The selector is not destroyed and rebuilt, only one is always used.

4. Some will appear when the program is first started, and will appear intermittently afterwards.

 

analyse problem

The three-way handshake process of normal TCP connection establishment:

 

The first step: the client sends syn to the server to initiate a handshake;

Step 2: The server responds to the client with syn+ack after receiving the syn;

Step 3: After the client receives syn+ack, it responds to the server with an ack indicating that it has received the server's syn+ack (at this time, the client's connection to port 56911 is already established).

From the description of the problem, it is a bit like the full connection queue (accept queue, which will be described later) is full when the TCP connection is established, especially symptom 2 and 4. To prove that this is the reason, immediately pass netstat -s | egrep "listen " Look at the overflow statistics of the queue:    

After reading it several times, I found that the overflowed has been increasing, so it is clear that the full connection queue on the server must overflow.

Then look at how the OS handles the overflow:

If tcp_abort_on_overflow is 0, it means that if the full connection queue is full during the third step of the three-way handshake, then the server throws away the ack sent by the client (the server thinks that the connection has not been established)

In order to prove that the client application code exception is related to the full connection queue, I first modified tcp_abort_on_overflow to 1, 1 means that if the full connection queue is full in the third step, the server sends a reset packet to the client, indicating that this is abolished The handshake process and this connection (originally this connection has not been established on the server side).

After testing, you can see a lot of connection reset by peer errors in the client's exception. It is proved that the client's error is caused by this reason (the key point is that the logic is rigorous and the problem is quickly proved).

So the development students looked through the java source code and found that the default backlog of the socket (this value controls the size of the fully connected queue, which will be described later) is 50, so I changed it to a larger size and ran again. After more than 12 hours of stress testing, this error occurred every time It did not appear, and it was observed that overflowed did not increase anymore.

At this point, the problem is solved. Simply put, there is an accept queue after the TCP three-way handshake. Only when you enter this queue can you change from Listen to accept. The default backlog value is 50, which can easily be full. When it is full, the server ignores the ack packet sent by the client during the third step of the handshake (every time the server resends the syn+ack packet of the second step of the handshake to the client). If the connection has not been queued, it will be abnormal. .

But not just to satisfy the problem, but to review the process of solving the problem, which knowledge points involved in the middle are missing or inadequately understood; besides the abnormal information above, is there any more clarity about this problem? Local indications to view and confirm this problem.

In-depth understanding of the connection process and queue in the TCP handshake process

As shown in the figure above, there are two queues: syns queue (semi-connected queue); accept queue (fully connected queue).

In the three-way handshake, in the first step, after the server receives the client's syn, it puts the connection information in the semi-connected queue and responds with syn+ack to the client (the second step);

In the third step, the server receives the client's ack. If the fully-connected queue is not full at this time, then the information of this connection is taken from the semi-connected queue and put into the fully-connected queue, otherwise it is executed as indicated by tcp_abort_on_overflow.

At this time, if the full connection queue is full and tcp_abort_on_overflow is 0, the server sends syn+ack to the client again after a period of time (that is, re-takes the second step of the handshake). If the client waits for a short timeout, the client is prone to exception .

In our os, the default number of retry second steps is 2 (the centos default is 5):

 

If the TCP connection queue overflows, what indicators are there to look at?

The above solution process is a bit circumstantial and sounds awkward, so next time there is a similar problem, is there any faster and clearer way to confirm this problem? (Through specific and perceptual things to strengthen our understanding and absorption of knowledge points.)

netstat -s

For example, the 667399 times seen above indicates the number of times that the fully connected queue overflows. If executed every few seconds, if this number keeps increasing, the fully connected queue must be full occasionally.

ss command

The Send-Q value in the second column seen above is 50, which means that the maximum fully connected queue on the listen port in the third column is 50, and the first column Recv-Q is how much the fully connected queue is currently using.

The size of the fully connected queue depends on: min(backlog, somaxconn). The backlog is passed in when the socket is created, somaxconn is an os-level system parameter.

At this time, you can establish contact with our code. For example, when Java creates a ServerSocket, you will be asked to pass in the value of the backlog:

The size of the semi-connection queue depends on: max(64, /proc/sys/net/ipv4/tcp_max_syn_backlog), there will be some differences between different versions of os.

When we wrote the code, we never thought about this backlog or did not give him the value most of the time (then the default is 50), we directly ignored him, first of all, this is a blind spot of knowledge; secondly, which article you might be in someday I saw this parameter in, I was a bit impressed at the time, but after a while I forgot. This is because there is no connection between knowledge and it is not systematic. But if you first experienced the pain of this problem like me, and then driven by the pressure and pain to find out why, and at the same time, you can understand why from the code layer to the OS layer, then this knowledge point can be considered a better grasp. It will also become a powerful tool for your knowledge system to grow and grow in TCP or performance.

netstat command

Netstat, like the ss command, can also see Send-Q and Recv-Q status information, but if the connection is not in Listen status, Recv-Q means that the received data is still in the cache and has not been read by the process. This value is the number of bytes that have not been read by the process; and Send is the number of bytes in the send queue that have not been confirmed by the remote host.

 

The Recv-Q seen by netstat -tn has nothing to do with the full connection and the half connection. Here, it is specially mentioned because it is easy to confuse with the Recv-Q of ss -lnt. By the way, establish a knowledge system to consolidate relevant knowledge points.

For example, the Recv-Q seen by netstat -t has a large amount of data accumulation, which is generally caused by the CPU not being able to process it:

Practice to verify the above understanding

Change the backlog in java to 10 (the smaller it is, the easier it will overflow), and continue to run under pressure. At this time, the client starts to report an exception again, and then observes through the ss command on the server:

According to the previous understanding, at this time we can see that the maximum service full connection queue on port 3306 is 10, but now there are 11 in the queue and waiting to enter the queue, and there must be one that cannot be connected to the queue and will overflow. At the same time, we can indeed see that the value of overflow is constantly increasing.

Accept queue parameters in Tomcat and Nginx

Tomcat defaults to short connections, backlog (the term in Tomcat is Accept count), Ali-tomcat defaults to 200, and Apache Tomcat defaults to 100.

Nginx default is 511

Because Nginx is a multi-process mode, I saw multiple 8085, that is, multiple processes listen to the same port to avoid context switching to improve performance   

to sum up

The problem of full connection queue and semi-connection queue overflow is easy to be overlooked, but it is also very important, especially for some short-connection applications (such as Nginx, PHP, of course, they also support long connections). Once it overflows, the CPU and thread status looks normal, but the pressure does not go up. From the client's point of view, rt is also relatively high (rt = network + queue + real service time), but the real service time recorded from the server log See rt is very short.

Some frameworks such as jdk and netty have a relatively small default backlog, which may cause performance problems in some cases.

I hope this article can help you understand the concepts, principles, and functions of semi-connected queues and fully-connected queues in the TCP connection process. More importantly, what indicators can clearly see these problems (engineering efficiency helps strengthen the understanding of the theory).

In addition, each specific problem is the best opportunity to learn. It is certainly not deep enough to understand by reading a book. Please cherish each specific problem. After encountering it, you can figure out the ins and outs. Each problem is your specific knowledge point. good chance.

 

Reference original text:

https://mp.weixin.qq.com/s?__biz=MzI3NzE0NjcwMg==&mid=2650121835&idx=1&sn=db8ef06ee4ea237c92ffb7fa969b4135&chksm=f36bbb4ac41c325cf49bb7a29fb078e2a65e6b96abc1511d87c0522092847118c7aaf95dc728&mpshare=1&scene=1&srcid=0812GDo6iaiSfjhXahMk87ND#rd

 

https://blog.csdn.net/wwh578867817/article/details/46707389

Guess you like

Origin blog.csdn.net/Lixuanshengchao/article/details/81606168