How to optimize TCP performance

This time I will talk to you about TCP performance optimization.

TCP is called Transmission Control Protocol. Every IT person has a certain understanding of TCP. The TCP protocol belongs to the underlying protocol. For most developers, it is transparent and does not need to care about the implementation and details of TCP.

However, if you want to do in-depth performance optimization, TCP is an inevitable link. To talk about TCP performance optimization, you must first review some details of TCP. Let's take a look at the first format of TCP

The header format of the TCP segment

img

The first 20 bytes of the header of the TCP message segment are fixed, and the 4n bytes at the end are options added as needed (n is an integer). Therefore, the minimum length of the TCP header is 20 bytes.

  • Sequence Number: The field value refers to the sequence number of the first byte of the data sent in this segment
  • Acknowledgement number: it is the sequence number of the first data byte expected to receive the next segment of the other party . If the confirmation number is = N, it means: all data up to the sequence number N-1 have been received correctly
  • ACK: The confirmation number field is valid only when ACK = 1, and the confirmation number is invalid when ACK = 0. TCP stipulates that after the connection is established, all transmitted message segments must set ACK to 1
  • SYN: Used to synchronize the serial number when the connection is established. When SYN=1 and ACK=0, it indicates that this is a connection request segment. If the other party agrees to establish a connection, it should set SYN=1 and ACK=1 in the response segment. Therefore, if SYN is set to 1, it means that this is a connection request or connection acceptance message.
  • Window: The window field clearly indicates the amount of data that the other party is now allowed to send. The window value often changes dynamically.
  • Options:
    • Maximum segment length MSS:

      • The largest data frame of Ethernet is 1518 bytes. The header 14 bytes of the Ethernet frame and the end of the frame CRC check 4 bytes (18 bytes in total), and the place that carries the upper layer protocol is the largest data field, which is only 1500 bytes. It is called MTU.
      • In order to achieve the best transmission performance, the TCP protocol usually negotiates the MSS value of the two parties when establishing a connection. This value is often replaced by the MTU value when the TCP protocol is implemented. The MSS is generally 1420~1460, and 1460 is from 1500-20 ( IP header)-20/60 (TCP header) calculated.
        Insert picture description here
    • Window expansion option: The length of the window field in the TCP header is 16 bits, so the maximum window size is 64K bytes. The maximum value of the window can be increased to 2 (16+14)-1=2 30-1

Three handshake

principle

All TCP connections go through a three-way handshake at the beginning, as shown in the following figure:

img

  • SYN: The client selects a random sequence number x and sends a SYN packet, which may also include other TCP flags and options.
  • SYN ACK: The server adds 1 to x, chooses its own random serial number y, adds its own flags and options, and then returns a response.
  • ACK: The client adds 1 to x and y and sends the last ACK packet during the handshake.

We have read the above content many times in the book, this time we use wireshark to capture the packet to see the details:

The local IP is 192.168.1.102 and the server IP is 122.51.162

sync

Insert picture description here
Insert picture description here

sync ack

Insert picture description here

ack

Insert picture description here

After the three-way handshake is completed, the client and server can communicate.

This process of initiating communication is applicable to all TCP connections, and therefore has a very large performance impact on all applications that use TCP, because each time application data is transmitted, a complete round trip must be experienced.

optimization

The delay caused by the three-way handshake makes it costly to create a new TCP connection. And this also determines the key to improving TCP application performance is to find ways to reuse connections.

TCP fast open

**TFO (TCP fast open)** allows the server and client to exchange data during the connection establishment handshake phase, so that the application saves an RTT delay.

But TFO can cause some problems, so the protocol requires TCP implementations to disable TFO by default. When the TFO function needs to be enabled on a certain service port, the application needs to be displayed and enabled.

View : sysctl net.ipv4.tcp_fastopen
Insert picture description here

Settings : sysctl -n net.ipv4.tcp_fastopen = 0x203

Limitations : It does not solve all problems. Although it helps to reduce the round-trip time of the three-way handshake, it can only be effective in certain situations. For example, the data payload sent with the SYN packet has a maximum size limit, and only certain These types of HTTP requests, and because they rely on encrypted cookies, can only be applied to repeated connections.

Effect : After traffic analysis and network simulation, Google researchers found that TFO can reduce the network latency of HTTP transactions by 15% on average and the entire page load time by more than 10%. In some cases with very long delays, the reduction can even reach 40%.

Try to reuse established TCP connections as much as possible

Long link (Keep-Alive)

Keep-Alive, enabled by default after HTTP 1.1, means that multiple copies of data can be sent continuously in one TCP connection without disconnecting the connection

Keep-Alive can be realized and needs server support:

Httpd daemon, such as nginx, need to set keepalive_timeout

  • keepalive_timeout=0: establish tcp connection + send http request + execution time + send http response + close tcp connection + 2MSL
  • keepalive_timeout>0: establish tcp connection + (last response time-first request time) + close tcp connection + 2MSL

In addition, TCP itself also has Keep-Alive, which is a fresh-keeping mechanism for detecting TCP connection status

  • net.ipv4.tcpkeepalivetime: indicates how many seconds after the TCP link has no data packet transmission start detection packet

  • net.ipv4.tcpkeepaliveintvl: the time interval between the previous probe packet and the next probe packet

  • net.ipv4.tcpkeepaliveprobes: the number of probes

Load balancing

Basic principle: The client (such as ClientA) and the load balancing device perform a three-way handshake and send an HTTP request. After the load balancing device receives the request, it will detect whether the server has an idle long link. If it does not exist, the server will establish a new connection. When the HTTP request response is completed, the client negotiates with the load balancing device to close the connection, and the load balancer maintains the connection with the server. When other clients (such as ClientB) need to send HTTP requests, the load balancing device will directly send HTTP requests to the idle connection maintained between the servers to avoid the delay and server resource consumption caused by the new TCP connection.
Insert picture description here

Receive window rwnd

Flow control is a mechanism to prevent the sender from sending too much data to the receiver. Otherwise, the receiving end may not be able to handle it due to busyness, heavy load, or limited buffer capacity. To achieve flow control,

Each party of a TCP connection must advertise its own receive window (rwnd), which contains information about the size of the buffer space that can save data.

img

When a connection is established for the first time, both ends will send rwnd using their own system default settings. Each ACK packet will carry the corresponding latest rwnd value, so that both ends can dynamically adjust the data flow rate to adapt to the capacity and processing capacity of the sender and receiver.

The initial TCP specification allocates 16 bits to the field of the notification window size, which is equivalent to setting the maximum value of the sender and receiver windows (2 to the 16th power is 65535 bytes). To solve this problem, RFC 1323 provides the "TCP Window Scaling" option, which can increase the size of the receiving window from 65 535 bytes to 1 G bytes!

Scaling the TCP window is completed during the three-way handshake, and there is a value that indicates the number of bits of the 16-bit window field to be shifted left in the future ACK.
Insert picture description here

optimization

The maximum amount of data that can be transferred between the client and the server is the minimum of the rwnd and cwnd variables.

Open window zoom

Enabling window scaling can upgrade the receiving window size from 2 16 to 2 30, and better transmission performance can be obtained.

View : sysctl net.ipv4.tcp_window_scaling
Insert picture description here

Setting : sysctl -w net.ipv4.tcp_window_scaling=1

Effect : Compared with not opening window zoom, bandwidth can be fully utilized

Here is the bandwidth delay product. **BDP (Bandwidth-delay product) ** The product of the capacity of the data link and its end-to-end delay. This result is the maximum amount of data that is unconfirmed in transit at any time.

No matter who the sender or receiver is forced to frequently stop waiting for the ACK of the previous packet, it will cause a data gap, which will inevitably limit the maximum throughput of the connection.

Regardless of the actual or advertised bandwidth, a window that is too small will limit the throughput of the connection.

Knowing the round-trip time and the actual bandwidth at both ends can also calculate the optimal window size. This time we assume that the round trip time is 100 ms, the available bandwidth at the sending end is 10 Mbps, and the receiving end is 100 Mbps+. Also assuming that there is no network congestion between the two ends, our goal is to make full use of the client's 10 Mbps bandwidth:

img

The window needs at least 122.1 KB to fully utilize the 10 Mbps bandwidth! Without "window scaling, the maximum TCP receiving window is only 64 KB, no matter how good the network performance is, the bandwidth will never be fully utilized."

Slow start and congestion avoidance

The receiving window is important to performance, but the congestion window is more important than the receiving window.

The maximum amount of data that can be transmitted between the client and the server (without ACK confirmation) is the minimum value of the rwnd and cwnd variables. The cwnd at the beginning is very small, and it continues to increase through the slow start algorithm.

There are many algorithms for slow start and congestion avoidance. The TCP version of the Tahoe version is used here for demonstration. This is also the first TCP version with congestion control function. The congestion avoidance algorithm used is AIMD (Multiplicative Decrease and Additive Increase). Increase).

image.png

  • SS: Slow Start, slow start phase. When TCP first begins to transmit, the speed slowly rises. Unless packet loss is encountered, the speed will continue to increase exponentially.
  • CA: Congestion Avoid, congestion avoidance stage. When the congestion window is larger than ssthresh, the growth rate of CWND will decrease, no longer exponentially like SS, but linearly.
  • Timeout: When the data sender perceives packet loss, it will record the CWND at this time and calculate a reasonable ssthresh value. Generally, ssthresh will be set to half of the CWND when the timeout expires, and the sender will drop the CWND to the initial state. CWND grows from small to large again, until sshtresh, no longer SS but CA

The server will have a default cwnd initial value. Initially, the value of cwnd was only 1 TCP segment. In April 1999, RFC 2581 increased it to 4 TCP segments. In April 2013, RFC 6928 increased it again to 10 TCP segments.

Calculation problems

Question : the time required for the size of cwnd to reach N

Solution :
Insert picture description here

Let's take a look at an example, assuming:

• The receiving window of the client and server is 65535 bytes (64 KB);

• The initial congestion window: 4 segments (RFC 2581);

• The round trip time is 56 ms (London to New York);

Insert picture description here

This example shows that under normal network conditions, it takes 224ms to reach the maximum transmission volume. Because the slow start limits the available throughput, which is very unfavorable for small file transfers, because the congestion control is still in the slowstart stage, and the transfer is complete.

optimization

Make sure the cwnd size is 10

View :

  1. Scripting

    probe kernel.function(“tcp_init_cwnd”).return

    {

    printf(“tcp_init_cwnd return: %d\n”, $return)

    }

  2. Upgrade the server kernel to the latest version (Linux: 3.2+)

Increase the initial congestion window of TCP

Setting : Add a proc parameter to control initcwnd in the kernel, /proc/sys/net/ipv4 /tcp_initcwnd. This method is valid for all TCP connections.

Restriction : The initial congestion window cannot be set to be very large, otherwise the buffer of the switching node will be filled, the extra packets must be deleted, and the corresponding host will create more and more datagram copies in the network, making the entire network Paralyzed. The major cdn manufacturers in the industry have adjusted the init_cwnd value, and the value is generally between 10-20

Effect :
Insert picture description here

Disable slow start restart

Term explanation : SSR (Slow-Start Restart) resets the congestion window of the connection after the connection is idle for a certain period of time.

Reason : While the connection is idle, the network condition may also change. To avoid congestion, the congestion window should be reset to the "safe" default value.

查看: sysctl net.ipv4.tcp_slow_start_after_idle
Insert picture description here

设置: sysctl -w net.ipv4.tcp_slow_start_after_idle=0

Effect : It has a great impact on those long-period TCP connections (such as HTTP keep-alive connections) that have bursts of idleness. The specific performance improvement depends on the network performance and the amount of data.

Change the congestion avoidance algorithm

Congestion control algorithms have a great impact on TCP performance. In addition to the AIMD algorithm mentioned above, there are many other algorithms.

PRR (Proportional Rate Reduction, Proportional Rate Reduction) is a new algorithm specified by RFC 6937. Its goal is to improve the recovery speed after packet loss.

Effect : According to Google's measurement, after implementing the new algorithm, the average connection delay caused by packet loss is reduced by 3% to 10%.

Settings : Upgrade the server. PRR is now the default congestion prevention algorithm in the Linux 3.2+ kernel.

Reduce the amount of data transferred

Scheme :

  1. Reduce the transmission of redundant data

  2. Compress the data to be transmitted: gzip, protobuf, webp, etc.

  3. No matter how fast it is, no need to send anything

Reduce round trip time

Scheme :

  1. Deploy servers in multiple computer rooms
  2. Use CDN

Head of line blocking

Head of Line (HOL, Head of Line) blocking : If a packet fails to reach the receiving end in the middle, the subsequent packets must be stored in the TCP buffer of the receiving end, waiting for the lost packet to be resent and reach the receiving end. All of this happens at the TCP layer. The application knows nothing about TCP retransmission and the packets queued in the buffer, and must wait for the packets to arrive before they can access the data. Prior to this, applications could only perceive delayed delivery when reading data through a socket.

img

Advantages : The application does not need to care about grouping and reorganization, thus keeping the code concise.

Disadvantage : There will be unpredictable delay changes in the packet arrival time. This time change is often called jitter, and it is also a major factor affecting application performance.

optimization

UDP

It cannot be optimized . This is the basic logic of TCP, and there is currently no possibility of optimization.

Applications that do not need to deliver data in order or can handle packet loss, as well as applications that require high delay or jitter, are best to choose protocols such as UDP.

For general audio or game applications, you can choose to use UDP protocol

to sum up

Optimization suggestions for TCP

  1. Server configuration tuning
    • The server uses the latest version
    • Increase the initial congestion window of TCP
    • Slow start restart
    • Window zoom (RFC 1323)
    • TCP fast open
    • Use the ss command or sysctl -a | grep tcp to view the relevant configuration
  2. Application behavior tuning
    • No matter how fast it is, you don’t need to send anything.
    • We can’t make data transfer faster, but we can make them transmit shorter distances
    • Reusing TCP connections is the key to improving performance
  3. Performance checklist
    • Upgrade the server kernel to the latest version (Linux: 3.2+);
    • Ensure that the size of cwnd is 10;
    • Disable slow start after idle;
    • Ensure that the startup window is zoomed;
    • Reduce the transmission of redundant data;
    • Compress the data to be transmitted;
    • Put the server close to the user to reduce the round trip time;
    • Try to reuse established TCP connections as much as possible.

data

  1. Web definitive performance guide
  2. TCP sliding window and window scaling factor
  3. TCP sliding window and congestion window
  4. Web performance optimization-TCP
  5. Just want you to understand TCP-Performance Optimization Encyclopedia
  6. The header format of the TCP segment
  7. TCP Socket communication detailed process
  8. TCP three-way handshake and SYN, ACK, Seq are not explained in detail
  9. Wireshark packet analysis
  10. Wireshark network analysis is that simple
  11. TCP-fastopen(TFO)
  12. TCP Series 40—Congestion Control—3. Overview of Slow Start and Congestion Avoidance
  13. TCP Series 41—Congestion Control—4. Slow Start and Congestion Avoidance in Linux (1)
  14. What is HTTP Keep-Alive? how to work? (Understand the TCP life cycle)
  15. Nginx-KeepAlive explained in detail

At last

If you like my article, you can follow my public account (Programmer Mala Tang)

Review of previous articles:

technology

  1. TCP performance optimization
  2. Current limit realization 1
  3. Redis implements distributed locks
  4. Golang source code bug tracking
  5. The realization principle of transaction atomicity, consistency and durability
  6. Detailed explanation of CDN request process
  7. The history of blog service being crushed
  8. Common caching techniques
  9. How to efficiently connect with third-party payment
  10. Gin framework concise version
  11. A brief analysis of InnoDB locks and transactions

reading notes

  1. How to exercise your memory
  2. Simple logic-after reading
  3. Hot air-after reading
  4. The Analects-Thoughts after Reading

Thinking

  1. Some views on project management
  2. Some thoughts on product managers
  3. Thoughts on the career development of programmers
  4. Thinking about code review
  5. Markdown editor recommendation-typora

Guess you like

Origin blog.csdn.net/shida219/article/details/107849308