TCP parameter optimization under Linux (detailed explanation)

Introduction

       TCP is a wide area network-oriented communication protocol. Its purpose is to provide a communication method between two communication endpoints with the following characteristics when communicating across multiple networks
     (1) flow-based;
     (2) connection-oriented;
     ( 3) Reliable communication method;
     (4) When the network condition is not good, try to reduce the system bandwidth overhead due to retransmission;
     (5) The communication connection maintenance is oriented to the two communication endpoints, regardless of the intermediate network segment and node.

In order to meet these characteristics of the TCP protocol, the TCP protocol has made the following provisions

       1. Data fragmentation: user data is fragmented at the sending end and reorganized at the receiving end. TCP determines the size of the fragments and controls the fragmentation and reassembly;
       2. Arrival confirmation: when the receiving end receives the fragmented data, Send an acknowledgment to the sender according to the fragment data sequence number;
       3. Timeout retransmission: The sender starts a timeout timer when sending fragments, and if the corresponding confirmation is not received after the timer expires, the fragment is retransmitted;
       4. Sliding window: The size of the receiving buffer space of each side of the TCP connection is fixed, and the receiving end only allows the other end to send the data that the receiving end buffer can accept. TCP provides flow control based on the sliding window to prevent faster hosts from causing slowness The host's buffer overflow;
       5. Out-of- sequence processing: TCP fragments transmitted as IP datagrams may be out of sequence when they arrive. TCP will reorder the received data and deliver the received data to the application in the correct order Layer;
       6. Duplicate processing: TCP fragments transmitted as IP datagrams will be duplicated, and the TCP receiver must discard the duplicate data;
       7. Data verification: TCP will maintain the checksum of its header and data, which is An end-to-end checksum is designed to detect any changes in the data during transmission. If there is an error in the checksum of the received fragment, TCP will discard the fragment without confirming that the receipt of this segment causes the peer to time out and resend it.

The above concepts are taken from Baidu Encyclopedia

https://baike.baidu.com/item/TCP/33012?fr=aladdin


I’m not very familiar with the concept or I don’t want to read the concept anymore. So, let me talk about something practical.

TCP parameter optimization under Linux (detailed explanation)

The above picture describes in detail the three-way handshake and four-time wave of TCP.
This picture is not bad, so it is not drawn by me . The following related parameter statement and optimization can refer to this picture. The drawing is very good. I will not show my ugliness. It’s definitely not too lazy

The following is the optimization scheme of specific TCP parameters

Please optimize according to the actual situation! ! !

# Indicates the backlog monitored by the socket (when a request (request) has not been processed or established, enter the backlog) upper limit
# limits the size of the listening queue for receiving new TCP connections. For a high-load web service environment that often handles new connections, the default 128 is too small. It is recommended to increase this value to 1024 or more in most environments. The server process limits the size of the listening queue (for example, sendmail(8) or Apache), and often has an option to set the queue size in their configuration file. Large listening queues will also help prevent denial of service DoS ***.

net.core.somaxconn = 262144

# Indicates that reuse is enabled. Allow TIME-WAIT sockets to be reused for new TCP connections, the default is 0, which means it is closed

net.ipv4.tcp_tw_reuse = 1

#Indicates that the fast recovery of TIME-WAIT sockets in the TCP connection is turned on, the default is 0, which means it is closed

net.ipv4.tcp_tw_recycle = 0

#keepalive keep time

net.ipv4.tcp_keepalive_time = 900

# Means that if the socket is closed by the local request, this parameter determines the time it stays in the FIN-WAIT-2 state (it can be changed to 30, generally speaking, there are very few FIN-WAIT-2 connections)

net.ipv4.tcp_fin_timeout = 15

#Port range for external connections

net.ipv4.ip_local_port_range = 10000 65500

#Reserved ports to avoid occupation, different ports can be separated by commas

net.ipv4.ip_local_reserved_ports = 50010,10050,32275

# Indicates the length of the queue for connections (SYN messages) that have not received client confirmation. The default is 1024. Increase the queue length to 819200, which can accommodate more network connections waiting for connection.

net.ipv4.tcp_max_syn_backlog = 819200

#TIME_WAIT status quantity
# indicates that the system maintains the maximum number of TIME_WAIT sockets at the same time. If this number is exceeded, the TIME_WAIT socket will be cleared immediately and the warning information will be printed. The default is 180000, changed to 8192000. For Apache, Nginx and other servers, the above few parameters can reduce the number of TIME_WAIT sockets, but for Squid, the effect is not great. This parameter can control the maximum number of TIME_WAIT sockets to prevent the Squid server from being dragged to death by a large number of TIME_WAIT sockets

net.ipv4.tcp_max_tw_buckets = 8192000

#This parameter is used to set the maximum number of tcp sockets allowed in the system that are not associated with any user file handle. If this number is exceeded, tcp socket characters that are not associated with the user's file handle will be reset immediately and a warning message will be given. This restriction is just to prevent simple DoS tools. Generally, when the system memory is sufficient, the value of this parameter can be increased:

net.ipv4.tcp_max_orphans = 3276800

#CONNTRACK_MAX The maximum allowed trace connection entry is the "task" (connection trace entry) that netfilter can process simultaneously in the kernel memory

net.netfilter.nf_conntrack_max = 250000

#tcp_synack_retries Displays or sets how many times the Linux kernel will try to resend the initial SYN and ACK packets in response to SYN requests before deciding to give up. This is the second step of the so-called threeway handshake. That is to say how many times the system will try to establish a TCP connection initiated by the remote end. The value of tcp_synack_retries must be a positive integer and cannot exceed 255. Because every time a packet is resent, it takes about 30 to 40 seconds to wait before deciding to try the next resend or decide to give up. The default value of tcp_synack_retries is 5, that is, it takes about 180 seconds (3 minutes) for each connection to determine the timeout.

net.ipv4.tcp_synack_retries = 2

#For a new connection, how many SYN connection requests the kernel has to send before deciding to give up. Should not be greater than 255, the default value is 5, which corresponds to about 180 seconds. (For networks with heavy load and good physical communication, this value is too high and can be modified to 2. This value is only for external connections, and for incoming connections, it is determined by tcp_retries1)

net.ipv4.tcp_syn_retries = 2
#四种TCP状态的超时时间
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 30
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 30
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 15
net.netfilter.nf_conntrack_tcp_timeout_established = 86400 

#When the detection is not confirmed, the frequency of resending the detection. The default is 75 seconds.

net.ipv4.tcp_keepalive_intvl = 15

#How many TCP keepalive probe packets are sent before determining that the connection is invalid? The default value is 9. This value is multiplied by tcp_keepalive_intvl to determine how long there can be no response after a connection sends a keepalive

net.ipv4.tcp_keepalive_probes = 5

#How many times to retry before this end attempts to close the TCP connection. The default value is 7, which is equivalent to 50 seconds to 16 minutes (depending on RTO). If your machine is an overloaded WEB server, you should consider reducing this value, because such sockets will consume a lot of important resources. See tcp_max_orphans.

net.ipv4.tcp_orphan_retries = 0

#Support larger TCP window. If the maximum TCP window exceeds 65535 (64K), this value must be set to 1

net.ipv4.tcp_window_scaling = 1

#When the three-way handshake for tcp to establish a connection is completed, when the connection is placed in the ESTABLISHED state and delivered to the backlog queue of the application, it will check whether the backlog queue is full. If it is full, the usual behavior is to restore the connection to the SYN_ACK state to cause the illusion of accidental loss of the ACK packet at the end of the 3-way handshake-so that the client can retransmit the ACK after the waiting timeout-to try to enter the ESTABLISHED state again-as A repair/retry mechanism. If tcp_abort_on_overflow is enabled, when it is checked that the backlog queue is full, an RST packet will be sent directly to the client to terminate the connection-the client program will receive a 104 Connection reset by peer error.

net.ipv4.tcp_abort_on_overflow = 1

#Manage the selective response of TCP, allowing the receiving end to pass the sequence number lost in the byte stream to the sending end, reducing the number of
segments that need to be retransmitted when the segment is lost. When the segment is lost frequently, sack is very beneficial.

net.ipv4.tcp_sack = 1

#Close the slow start of tcp connection transmission, that is, stop for a period of time, and then initialize the congestion window.

net.ipv4.tcp_slow_start_after_idle = 0

#When the rate at which each network interface receives data packets is faster than the rate at which the kernel processes these packets, the maximum number of data packets allowed to be sent to the queue

net.core.netdev_max_backlog = 300000


#The memory allocated by the kernel to the TCP connection, the unit is Page, 1 Page = 4096 Bytes, you can view it with the command: #getconf PAGESIZE #The
first number indicates that when the page used by tcp is less than 1048576, the kernel will not interfere with it.
#The second number means that when tcp uses more than 1310720 pages, the kernel will enter "memory pressure" pressure mode
#The third number means that when tcp uses more than 1572864 pages (equivalent to 1.6GB of memory), then Report: Out of socket memory

net.ipv4.tcp_mem = 1048576 1310720 1572864

#The size of the read and write buffer memory allocated for each TCP connection, the unit is Byte #The
first number indicates the minimum memory allocated for the TCP connection
#The second number indicates the default memory allocated for the TCP connection
#第第Three numbers indicate that the maximum memory allocated for TCP connection
# is generally allocated according to the default value. The following example is 8KB for reading and writing, a total of 16KB
#1572864*16kb=25165824kb is equivalent to 26G memory

net.ipv4.tcp_rmem = 4096 8192 16384

#Default TCP data receiving window size (bytes).

net.core.rmem_default = 1048576

#Maximum TCP data receiving window (bytes).

net.core.rmem_max = 15728640

#Define the memory used by each socket for automatic tuning.
#The first value is the minimum number of bytes allocated for the send buffer of the socket.
#The second value is the default value (this value will override wmem_default), the buffer can grow to this value when the system load is not heavy.
#The third value is the maximum number of bytes of the send buffer space (this value will override wmem_max).

net.ipv4.tcp_wmem = 256000 768000 4194304

#Various types of socket default read and write buffer size

net.core.wmem_default = 1048576

#Various types of socket default read and write buffer maximum

net.core.wmem_max = 5242880

#panic error Restart automatically, wait for timeout to be 20 seconds

kernel.panic = 20

# Indicates the number of file handles that can be opened at the system level. It is a restriction on the entire system, not for users.

fs.file-max = 6553560

Many of the above parameters are worth modifying, not absolutely necessary, you still have to refer to actual needs

consider

How to optimize these parameters? How do I know how to modify it is reasonable?
My suggestion is to monitor after the basic optimization, check the resource consumption of tcp and where the specific card is.
Below are some approximate methods of obtaining TCP monitoring under Linux systems, for reference only

View the current number of connections: the
code is as follows:

grep ip_conntrack /proc/slabinfo
ip_conntrack 38358 64324 304 13 1 : tunables 54 27 8 : slabdata 4948 4948 216

Get the actual current numerical
code of each TCP handshake wave as follows:

# netstat -an | awk '/^tcp/ {++state[$6]} END   {for (key in state) print key,"\t",state[key]}'
TIME_WAIT        1832
CLOSE_WAIT       360
FIN_WAIT2        12
ESTABLISHED      3588
SYN_RECV         148
CLOSING          7
LAST_ACK         19
LISTEN   59

Find out the current ranking of ip_conntrack: the
code is as follows:

$ cat /proc/net/nf_conntrack | cut -d ' ' -f 10 | cut -d '=' -f 2 | sort | uniq -c | sort -nr | head -n 10

to sum up

It is better to adjust the parameters according to the actual situation, it is the most scientific, don't blindly increase or make a one-size-fits-all approach

Guess you like

Origin blog.51cto.com/14839701/2551553