Unknown network programming (11): starting from the bottom, in-depth analysis of the secrets of TCP connection time-consuming

The author of this article, Zhang Yanfei, has a little change in the original title "Talking about the time-consuming TCP connection".

1 Introduction

For Internet-based communication applications (such as IM chat, push system), the TCP protocol is relatively used for data transmission. This is because in the transport layer protocol of the TCP/IP protocol suite, the TCP protocol has the advantages of reliable connection, error retransmission, and congestion control, so it is currently more widely used in application scenarios than UDP.

I believe you must have heard that TCP also has some shortcomings, and it is often a commonplace that the overhead is slightly larger. However, various technical blogs simply say that the overhead is high or the overhead is small, and it is rare that a specific quantitative analysis is not given. To put it bluntly, similar discussions are nonsense with no nutrition.

After thinking about my daily work, what I want to understand more is how big the overhead of TCP is and whether it can be quantified. How long does it take to establish a TCP connection, how many milliseconds, or how many microseconds? Can there be even a rough quantitative estimate? Of course, there are many factors that affect the TCP time-consuming, such as network packet loss and so on. Today I only share various situations that I have encountered with high incidence in my work practice.

Written in the front: Thanks to the open source of the Linux kernel, the underlying and specific kernel-level code examples mentioned in this article are all based on the Linux system.

2. Series of articles

This article is the 11th in a series of articles, the outline of which is as follows:

  1. " Unknown Network Programming (1): Analysis of Intractable Diseases in the TCP Protocol (Part 1) "
  2. " Unknown Network Programming (2): Analysis of Intractable Diseases in the TCP Protocol (Part 2) "
  3. " Unknown Network Programming (3): Why TIME_WAIT, CLOSE_WAIT When Closing a TCP Connection "
  4. " Unknown Network Programming (4): In-depth study and analysis of abnormal shutdown of TCP "
  5. " Unknown Network Programming (5): UDP Connectivity and Load Balancing "
  6. " Unknown Network Programming (6): In-depth understanding of the UDP protocol and making good use of it "
  7. " Unknown Network Programming (7): How to Make Unreliable UDP Reliable?
  8. " Unknown Network Programming (8): Deep Decryption of HTTP from the Data Transport Layer "
  9. " Unknown Network Programming (9): Combining theory with practice, comprehensive and in-depth understanding of DNS "
  10. " Unknown Network Programming (10): Deepening the Operating System and Understanding the Receiving Process of Network Packets from the Kernel (Linux) "
  11. " Unknown Network Programming (11): Starting from the bottom, in-depth analysis of the secrets of TCP connection time-consuming " (this article)
  12. " Unknown Network Programming (12): Thoroughly Understand the KeepAlive Mechanism of the TCP Protocol Layer "
  13. " Unknown Network Programming (13): Go deep into the operating system and thoroughly understand 127.0.0.1 local network communication "
  14. " Unknown Network Programming (14): Unplug the network cable and plug it in again, is the TCP connection still there? Understand in one sentence!

3. Time-consuming analysis of TCP connections under ideal conditions

To understand the time-consuming TCP connection, we need to understand the connection establishment process in detail.

In the previous article " Go deep into the operating system and understand the receiving process of network packets from the kernel (Linux) ", we introduced how the data packets are received at the receiving end: the data packets come out from the sender and reach the network card of the receiver through the network; After the receiver network card DMAs the data packet to the RingBuffer, the kernel processes it through hard interrupt, soft interrupt and other mechanisms (if the user data is sent, it will finally be sent to the receive queue of the socket and wake up the user process).

In the soft interrupt, when a packet is picked from the RingBuffer by the kernel, it is represented by the struct sk_buff structure in the kernel (see the kernel code include/linux/skbuff.h). The data member is the received data. When the protocol stack is processed layer by layer, the data that each layer of protocol cares about is found by modifying the pointer to point to different locations of the data.

For TCP protocol packets, there is an important field - flags in its Header.

As shown below:

By setting different flag bits, the TCP packets are divided into SYNC, FIN, ACK, RST and other types:

  • 1) The client uses the connect system call to command the kernel to issue SYNC, ACK and other packets to establish a TCP connection with the server;
  • 2) On the server side, many connection requests may be received, and the kernel also needs to use some auxiliary data structures - semi-connection queues and full-connection queues.

Let's take a look at the entire connection process:

In this connection process, let's briefly analyze the time-consuming of each step:

  • 1) The client sends a SYNC packet: the client generally sends a SYN through the connect system call, which involves the CPU time-consuming overhead of the local system call and soft interrupt;
  • 2) SYN is transmitted to the server: SYN is sent from the client's network card, and begins to "cross the mountains and the sea, and also through the sea of ​​people...", which is a long-distance network transmission;
  • 3) The server processes the SYN packet: the kernel receives the packet through a soft interrupt, then puts it in the semi-connected queue, and then sends a SYN/ACK response. It is also the CPU time-consuming overhead;
  • 4) SYC/ACK is transmitted to the client: After the SYC/ACK is sent from the server, it also crosses many mountains and possibly many seas to the client. Another long-distance network journey;
  • 5) The client processes SYN/ACK: After the client kernel receives the packet and processes the SYN, after a few us CPU processing, it sends an ACK. The same is the soft interrupt processing overhead;
  • 6) The ACK is transmitted to the server: the same as the SYN packet, it is transmitted through almost the same distance. Another long-distance network journey;
  • 7) The server receives the ACK: the server kernel receives and processes the ACK, and then takes the corresponding connection from the semi-connected queue and puts it in the full-connected queue. A soft interrupt CPU overhead;
  • 8) Server-side user process wakeup: The user process blocked by the accpet system call is woken up, and then the established connection is taken out from the full connection queue. CPU overhead for one context switch.

The above steps can be simply divided into two categories:

  • The first category: the kernel consumes the CPU for receiving, sending or processing, including system calls, soft interrupts and context switching. Their time-consuming is basically a few us;
  • The second category is network transmission. After the packet is sent from a machine, it has to go through various network cables, various switches and routers in the middle. Therefore, the time-consuming of network transmission is much higher than that of local CPU processing. According to the distance of the network, it generally ranges from a few ms to several hundreds of ms.

1ms is equal to 1000us, so the network transmission time is about 1000 times higher than the CPU overhead of both ends, or even 100000 times higher.

Therefore, in the process of establishing a normal TCP connection, generally consider the network delay.

PS: An RTT refers to a round-trip delay time of a packet from one server to another server.

So from a global point of view: the network time required for establishing a TCP connection takes about three transmissions, plus a little CPU overhead on both sides, which is a little larger than 1.5 times the RTT in total.

However, from the client's point of view: as long as the ACK packet is sent, the kernel considers that the connection is established successfully. Therefore, if you count the time taken to establish a TCP connection on the client side, you only need two transmission times—that is, a little more than 1 RTT. (Similarly from the perspective of the server side, it takes time from the receipt of the SYN packet to the receipt of the ACK, and an RTT takes time).

4. Time-consuming analysis of TCP connections in extreme cases

As can be seen in the previous section: From the perspective of the client, under normal circumstances, the total time taken for a TCP connection is about the time taken for a network RTT. If everything were that simple, I don't think my sharing would be necessary. Things don't always have to be this good, and surprises are inevitable.

In some cases, it may lead to increased network transmission time during TCP connection, increased CPU processing overhead, or even connection failure. In this section, I will analyze the time-consuming situation of TCP connections in extreme cases based on the various personal experiences I have encountered online.

4.1 The time-consuming and out-of-control case of the client connect call

A normal system call takes about a few us (microseconds). However, in my article " Tracking the Murderer Who Exhausted Server CPU! ", one of the author's servers encountered a situation at that time: an operation and maintenance classmate conveyed that the service CPU was not enough and needed to be expanded.

The server monitoring at that time is as follows:

The service has been resisting about 2000 qps per second before, and the idel of the CPU has always been 70%+, why suddenly the CPU is not enough.

What's even more strange is that when the CPU was hit the bottom, the load was not high ( the server is a 4-core machine, and the load of 3-4 is relatively normal ).

Later, after investigation, it was found that when the TIME_WAIT of the TCP client is about 30000, resulting in that the available ports are not particularly sufficient, the CPU overhead of the connect system call directly increased by more than 100 times, and the time consumption reached 2500us (microseconds) each time, reaching millisecond level.

 

When this problem is encountered, although the TCP connection establishment time is only increased by about 2ms, the overall TCP connection time seems acceptable. But the problem here is that more than 2ms are consuming CPU cycles, so the problem is not small.

The solution is also very simple, and there are many ways: modify the kernel parameter net.ipv4.ip_local_port_range to reserve more port numbers, or use a long connection instead.

4.2 The case where the TCP half/full connection queue is full

If any queue is full during the connection establishment, the syn or ack sent by the client will be discarded. After the client waits for a long time without results, it will then issue a TCP Retransmission.

Take the semi-join queue as an example:

It should be known that the above TCP handshake timeout retransmission time is in seconds. That is to say, once the connection queue on the server side causes the connection establishment to fail, it will take at least seconds to establish the connection. In the case of the same computer room, it only takes less than 1 millisecond, which is about 1000 times higher.

Especially for programs that provide real-time services to users, the user experience will be greatly affected. If the handshake is not successful even in the retransmission, it is very likely that the user cannot wait for the second retry, and the user access will time out directly.

There is another worse situation: it may affect other users.

If you use the process/thread pool model to provide services, such as: php-fpmWe know that the fpm process is blocked. When it responds to a user request, the process has no way to respond to other requests. Suppose you open 100 processes/threads, and 50 processes/threads are stuck on the handshake connection with the redis or mysql server for a certain period of time ( note: your server is the client side of the TCP connection at this time ). This period of time is equivalent to only 50 normal working processes/threads that you can use. And the 50 workers may not be able to handle it at all, and your service may be congested at this time. If it continues for a little longer, an avalanche may occur, and the entire service may be affected.

Since the consequences can be so severe, how do we see if the service at hand is full because the half/full connection queue is full?

On the client side: You can capture packets to see if there is a SYN TCP Retransmission. If there is an occasional TCP Retransmission, it means that there may be a problem with the corresponding server connection queue.

On the server side: it is more convenient to view. netstat -s  can view the packet loss statistics caused by the current system's semi-connection queue being full, but this number records the total number of lost packets. You need to use the  watch  command to monitor dynamically. If the following numbers change during your monitoring process, it means that the current server has lost packets due to the full semi-connection queue. You may need to increase the length of your semi-join queue.

$ watch'netstat -s | grip LISTEN '

    8 SYNs to LISTEN sockets ignored

For a fully connected queue, the viewing method is similar:

$ watch'netstat -s  | grep overflowed'

    160 timesthe listen queue of a socket overflowed

If your service is dropping packets because the queue is full, one of the options is to increase the length of the half/full connection queue. In the Linux kernel, the length of the semi-connection queue is mainly affected by tcp_max_syn_backlog and it can be increased to an appropriate value.

# cat /proc/sys/net/ipv4/tcp_max_syn_backlog

1024

# echo "2048" > /proc/sys/net/ipv4/tcp_max_syn_backlog

The full connection queue length is the smaller of the backlog passed in when the application calls listen and the kernel parameter net.core.somaxconn. You may need to tune both your application and this kernel parameter.

# cat /proc/sys/net/core/somaxconn

128

# echo "256" > /proc/sys/net/core/somaxconn

After the modification, we can confirm the final effective length through the Send-Q output by the ss command:

$ ss -nlt

Recv-Q Send-Q Local Address:Port Address:Port

0 128 *:80 *:*

Recv-Q tells us the current full connection queue usage length of the process. If Recv-Q has approached Send-Q, then you may not need to wait for packet loss and you should be ready to increase your fully connected queue.

If there are still very occasional queue overflows after increasing the queue, we can tolerate it for the time being.

What if there is still a long period of time that cannot be processed?

Another way is to report an error directly, and don't let the client wait for a timeout.

For example, set the kernel parameter tcp_abort_on_overflow of the backend interfaces such as Redis and Mysql to 1. If the queue is full, send reset directly to the client. Tell the backend processes/threads not to wait idiotically. At this point the client will receive the error "connection reset by peer". Sacrificing a user's access request is better than crashing the entire site.

5, TCP connection time-consuming measured analysis

5.1 Preparation before the test

I wrote a very simple code to count how long it takes to create a TCP connection on the client side.

<?php

$ip = {server ip};

$port= {server port};

$count= 50000;

function buildConnect($ip,$port,$num){

    for($i=0;$i<$num;$i++){

        $socket= socket_create(AF_INET,SOCK_STREAM,SOL_TCP);

        if($socket==false) {

            echo "$ip $port socket_create() failed because: ".socket_strerror(socket_last_error($socket))."\n";

            sleep(5);

            continue;

        }

 

        if(false == socket_connect($socket, $ip, $port)){

            echo "$ip $port socket_connect() failed because: ".socket_strerror(socket_last_error($socket))."\n";

            sleep(5);

            continue;

        }

        socket_close($socket);

    }

}

 

$t1= microtime(true);

buildConnect($ip, $port, $count);

echo(($t2-$t1)*1000).'ms';

Before testing, we need to have enough ports available on the local linux. If it is not enough, it is best to adjust it enough.

# echo "5000   65000" /proc/sys/net/ipv4/ip_local_port_range

5.2 Testing under normal conditions

Note: Do not choose a machine where the wired service is running on either the client or the server, otherwise your test may affect normal user access

First of all: my client is located in the IDC computer room in Huailai, Hebei, and the server selected is a certain machine in the company's Guangdong computer room. The delay obtained by executing the ping command is about 37ms. After using the above script to establish 50,000 connections, the average connection time is also 37ms.

This is because as we said earlier, for the client, as long as the third handshake is sent out, the handshake is considered to be successful, so only one RTT and two transmissions are required. Although there will be system call overhead and soft interrupt overhead of the client and server in the middle, because their overhead is normally only a few us (microseconds), it has little effect on the total connection establishment delay.

Next: I changed a target server, the server room is located in Beijing. There is some distance from Huailai, but it is much closer than Guangdong. The RTT of this ping is about 1.6~1.7ms. After the client establishes 50,000 connections, it is calculated that each connection takes 1.64ms.

Do another experiment: The server and client selected for this experiment are directly located in the same computer room, and the ping delay is about 0.2ms~0.3ms. After running the above script, the experimental result is that 50000 TCP connections consume a total of 11605ms, with an average of 0.23ms each time.

Online architecture reminder: Here we see that the delay in the same computer room is only a few tenths of a millisecond, but across a computer room not far away, the time required for the TCP handshake alone has increased by 4 times. If you go to Guangdong across regions, it will be a hundred times the time-consuming gap. When deploying online, the ideal solution is to deploy the various mysql, redis and other services that your own services depend on in the same area and the same computer room as yourself (a little bit more abnormal, or even the same rack). Because of this, the transmission of various network packets, including TCP link establishment, is much faster. It is necessary to avoid long-distance cross-region computer room calls as much as possible.

5.3 Test in the case of TCP connection queue overflow

Cross-region, cross-machine room and cross-machine have been tested. This time, for the sake of speed, what will be the result of establishing a connection directly with this machine?

The delay of pinging the local ip or 127.0.0.1 is about 0.02ms, and the local ip is definitely shorter than the RTT of other machines. I think the connection will be very fast, um experiment.

Continuous establishment of 5W TCP connection: the total time consumption is 27154ms, and the average time is about 0.54ms each time.

Ok! ? How is it much longer than cross-machine?

With the previous theoretical basis, we should have thought: because the local RTT is too short, the instantaneous connection establishment request is very large, which will lead to the situation that the full connection queue or semi-connection queue is full. Once the queue is full, the connection request that hits at that time needs a connection establishment delay of 3 seconds +. So in the above experimental results, the average time-consuming seems to be much higher than RTT.

During the experiment, I used tcpdump to capture packets and saw the following scene. It turned out that a small number of handshakes took 3s+, because the semi-connection queue was full, and the client retransmitted SYN after waiting for a timeout.

We changed it again to sleep for 1 second every 500 connections. Well, finally there is no card (or you can increase the length of the connection queue).

The conclusion is: 50,000 TCP connections on the local machine take a total of 102399 ms to count on the client. After subtracting 100 seconds of sleep, each TCP connection consumes 0.048 ms on average. Slightly higher than ping latency.

This is because when the RTT becomes small enough, the time-consuming overhead of the kernel CPU will appear. In addition, the TCP connection is more complicated than the icmp protocol of ping, so it is normal to have a delay of about 0.02ms higher than that of ping.

6. Summary of this article

In the case of abnormal establishment of a TCP connection, it may take several seconds. One disadvantage is that it will affect the user experience, and may even cause the current user access to timeout. Another disadvantage is that it may induce avalanches.

So when your server uses a short connection to access data: Be sure to learn to monitor your server's connection establishment for abnormal conditions. If there is, learn to optimize it away. Of course, you can also use the native memory cache, or use the connection pool to maintain long connections. These two methods can directly avoid the various overheads of the TCP handshake.

Besides, under normal circumstances: the delay of TCP establishment is about one RTT time between two machines, which is unavoidable. But you can control the physical distance between the two machines to reduce this RTT. For example, deploy the redis you want to access as close to the back-end interface machine as possible, so that the RTT can also be reduced from tens of ms to the lowest possible zero. few ms.

Finally, let's think again: if we deploy the server in Beijing, is it feasible for users in New York to access it?

Whether we are in the same computer room or across the computer room, the time-consuming of electrical signal transmission can basically be ignored ( because the physical distance is very close ), and the network delay is basically the time-consuming of the forwarding equipment. But if it spans half the world, we have to figure out how long it takes to transmit electrical signals. The spherical distance from Beijing to New York is about 15,000 kilometers, so regardless of the forwarding delay of the device, only one round trip at the speed of light ( RTT is Rround trip time, it needs to run twice ), it takes time = 15,000,000 * 2 / speed of light = 100ms. The actual delay may be larger than this, generally more than 200ms. Based on this delay, it is difficult to provide second-level services that users can access. Therefore, for overseas users, it is best to build a local computer room or buy an overseas server.

study Exchange:

- Introductory article on mobile IM development: " One entry is enough for beginners: developing mobile IM from scratch "

- Open source IM framework source code: https://github.com/JackJiang2011/MobileIMSDK ( click here for alternate address )

( This article has been published simultaneously at: http://www.52im.net/thread-3265-1-1.html )

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/jb2011/blog/5531742