04 | Wave: Why does Nginx log report connection reset by peer?

Today, we are going to learn the knowledge of TCP waving through actual cases and deepen our understanding of this knowledge in actual combat.

When we do some application troubleshooting, we often see TCP-related errors in the logs. For example, in the Nginx log, there may be errors such as connection reset by peer. "The connection was reset by the peer", this literally means clear. However, I can’t help but feel embarrassed:

Will this reset affect our business? Was this transaction successful?

At what specific stage does this reset occur? Is it a normal TCP disconnection?

What can we do to avoid this reset?

To answer this type of questioning, Nginx logs may not be enough.

In fact, the advantage of network layering is that each layer can concentrate on doing its own thing. The disadvantage is not without, as is the case in this case: the application layer only knows that the operating system tells it, "Hey, your connection has been reset." But why is it reset? The application layer cannot know, only the operating system knows, but the operating system just handles the matter and adds 1 to the internal reset counter, but does not record the context of this reset.

Therefore, in order to figure out what exactly happens when connection reset by peer, we need to break through the well of the application layer and jump out to see the larger network world.

Build a bridge between the application layer and the network layer

First of all, you need to understand the meaning of connection reset by peer. If you are familiar with TCP, you should think that this is probably the peer (peer) replying to the TCP RST (reset here) and terminating a TCP connection. In fact, this is also the first key point for us to do network troubleshooting: "translate" the application layer information into the transport layer and network layer information.

In other words, we need to accomplish something that is sometimes challenging: connecting the information of the application layer with the information of the transport layer and network layer below it.

The "application layer information" mentioned here may be the following:

Application layer log, including success log, error log, etc.;

Application layer performance data, such as RPS (requests per second), transaction time (processing time), etc.;

Application layer payload, such as HTTP request and response header, body, etc.

The "transport layer/network layer information" may be the following:

Transport layer: TCP Sequence Number, Acknowledgment Number, MSS (Maximum Segment Size), Receive Window, Congestion Window Window), latency (Latency), duplicate confirmation (DupAck), selective confirmation (Selective Ack), retransmission (Retransmission), packet loss (Packet loss), etc.

Network layer: IP TTL, MTU, hops, routing table, etc.

It can be seen that the perspectives and metrics of these two categories of information (application vs network) are completely different, so there is almost no way to directly connect them. And this has created two major gaps in problem solving.

The gap between application phenomena and network phenomena: You may understand the application layer logs, but you don’t know what specifically happened on the network.

The gap between tool tips and protocol understanding: You can understand the meaning of the output information of tools such as Wireshark and tcpdump, but you just can’t really understand them with you. Correspond to the understanding of the agreement.

In other words,You need to have the ability to bridge the two major gaps. With this ability, you will also have< /span>This is the core capability of network troubleshooting. The ability to “translate” two major types of information (application information and network information).

Since it is a case study, learning this knowledge from the case is the most efficient way. Next, let’s look at two cases together.

Case 1: connection reset by peer?

A few years ago, a customer also reported that they encountered many connection reset by peer errors on their Nginx server. They are concerned that this issue is having an impact on their business and want our help in identifying the cause. The customer's application is an ordinary Web service installed on Nginx, and their other set of machines serve as clients to call the Web service on Nginx.

The architecture diagram is as follows: 

As mentioned before, it is almost difficult to determine the underlying cause of connection reset by peer simply from the application layer logs. Therefore, we started the packet capture work. The specific steps are:

We need to choose one end to capture packets, this time it is the client;

Checking the application log, I found that a connection reset by peer error occurred within a few minutes;

Compare the error logs and packet capture files to look for clues.

Let’s first take a look at what these error logs look like:

2015/12/01 15:49:48 [info] 20521#0: *55077498 recv() failed (104: Connection reset by peer) while sending to client, client: 10.255.252.31, server: manager.example.com, request: "POST /WebPageAlipay/weixin/notify_url.htm HTTP/1.1", upstream: "http:/10.4.36.207:8080/WebPageAlipay/weixin/notify_url.htm", host: "manager.example.com"
2015/12/01 15:49:54 [info] 20523#0: *55077722 recv() failed (104: Connection reset by peer) while sending to client, client: 10.255.252.31, server: manager.example.com, request: "POST /WebPageAlipay/app/notify_url.htm HTTP/1.1", upstream: "http:/10.4.36.207:8080/WebPageAlipay/app/notify_url.htm", host: "manager.example.com"
2015/12/01 15:49:54 [info] 20523#0: *55077710 recv() failed (104: Connection reset by peer) while sending to client, client: 10.255.252.31, server: manager.example.com, request: "POST /WebPageAlipay/app/notify_url.htm HTTP/1.1", upstream: "http:/10.4.36.207:8080/WebPageAlipay/app/notify_url.htm", host: "manager.example.com"
2015/12/01 15:49:58 [info] 20522#0: *55077946 recv() failed (104: Connection reset by peer) while sending to client, client: 10.255.252.31, server: manager.example.com, request: "POST /WebPageAlipay/app/notify_url.htm HTTP/1.1", upstream: "http:/10.4.36.207:8080/WebPageAlipay/app/notify_url.htm", host: "manager.example.com"
2015/12/01 15:49:58 [info] 20522#0: *55077965 recv() failed (104: Connection reset by peer) while sending to client, client: 10.255.252.31, server: manager.example.com, request: "POST /WebPageAlipay/app/notify_url.htm HTTP/1.1", upstream: "http:/10.4.36.207:8080/WebPageAlipay/app/notify_url.htm", host: "manager.example.com"

Supplement: Because the logs involve customer data security and privacy, they have been desensitized.

The most "conspicuous" thing seems to be the sentence connection reset by peer. In addition, we can also pay attention to other information in the error log, which can also help us obtain a more comprehensive context.

recv() failed: The recv() here is a system call, which is the Linux network programming interface. Its function is easy to understand from the literal meaning, it is used to receive data. We can directly man recv to see the detailed information of this system call, including its various exception status codes.

104: This number is also related to system calls. It is a status code when an exception occurs in the recv() call, which is given by the operating system. In the Linux system, 104 corresponds to ECONNRESET, which is the situation when a TCP connection is abnormally closed by a RST message.

upstream: In the terminology of reverse proxy software such as Nginx, upstream refers to the backend server. In other words, the client sends the request to Nginx, and Nginx will forward the request to upstream. After the latter replies with an HTTP response, Nginx will reply the response to the client. Note that the "Client <->Nginx" and "Nginx<->upstream" here are two independent TCP connections, as shown in the figure below:

Additional: You may find it strange that the data inside obviously comes from the outside, but why is the data inside called upstream? In fact, it is like this:From the perspective of network operation and maintenance, we pay more attention to the flow of network packets, because HTTP packets come in from the outside. Then we think that its upstream is the client;But from an application perspective, we are more concerned about the flow of data. Generally speaking, HTTP data It is sent from the inside out, so from this perspective, the upstream (upstream) of the data is the back-end server. Similarly, in the HTTP protocol specification RFC, upstream also refers to the server.

 Nginx and Envoy are both application gateways, so in their terminology, upstream refers to the back-end link. There is no right or wrong here, as long as you know and follow this agreement.

At this point, now that the error log has been clearly interpreted, let’s move on to the analysis of the packet capture file.

Write the filter first

Although in the previous class, Wireshark was used to analyze a lot of handshake-related cases, its use is still relatively simple. So starting from today's lesson, we will use Wireshark in depth. For example, in the following content, many Wireshark filters (also called filter expressions or filter conditions) will be used. Because there are a little more steps, it will take a little more time to explain.

Generally speaking, in the original captured packet file, the packets that are actually of concern only account for a small part of the whole. Then, how to locate the messages related to the problem is a matter of knowledge.

As far as the current case is concerned, since there are application layer logs and clear information such as related IP addresses, these create conditions for us to filter packets. We need to write a filter that takes IP as the condition and first filters out the packetsrelated to this IP from the original file.

In Wireshark, the common filter syntax based on IP conditions mainly includes the following: 

ip.addr eq my_ip:过滤出源IP或者目的IP为my_ip的报文
ip.src eq my_ip:过滤出源IP为my_ip的报文
ip.dst eq my_ip:过滤出目的IP为my_ip的报文

However, this is only the first filtering condition. If filtered only through it, the number of packets coming out will still be much more than the packets we really care about. We also need a second filter condition, which is to find TCP RST messages. This requires the use of another type of filter, which is tcp.flags, and the flags here are TCP flags such as SYN, ACK, FIN, PSH, and RST.

For RST messages, the filtering conditions are:

tcp.flags.reset eq 1

You can select any packet and pay attention to its TCP Flags part:

Open the packet capture file and enter this filter condition:

ip.addr eq 10.255.252.31 and tcp.flags.reset eq 1

You will find a lot of RST messages:

In the lower right corner of the Wireshark window, there is the number of packets that meet the filtering conditions. There are 9122 of them, accounting for 4% of all packets, which is indeed a lot. It is inferred from this that many of the error reports in the logs are probably caused by some of the RSTs. Let’s choose one and take a look first.

In Lecture 2, you learned how to use Wireshark to find all other messages in the entire TCP flow based on a message. Here, we select message No. 172, right-click, select Follow -> TCP Stream, and find the entire TCP stream message to which it belongs:

Hey, this RST is in the handshake stage? Since this RST is the third message in the handshake phase, but it is not the expected ACK, but RST+ACK, so the handshake fails.

However, you may ask:Is this kind of RST in the handshake phase also related to the connection reset by peer in the Nginx log??

To answer this question, we must first understand how the application interacts with the kernel's TCP protocol stack. Generally speaking, when the client initiates a connection, the following system calls are called in sequence:

socket()

connect()

The server listens on the port and provides services, so the following system calls must be called in sequence:

socket()

bind()

listen()

accept()

To use a TCP connection to receive requests, the user space program on the server must first obtain the last interface above, which is the return of the accept() call. The prerequisite for the accept() call to return successfully is to complete the three-way handshake normally.

Look, this time the client’s third packet in the handshake was not ACK, but RST (or RST+ACK). Didn’t the handshake fail? Naturally, this failed handshake will not be converted into a valid connection, so Nginx does not know that there is such a failed handshake.

Of course, the failure of this handshake can be recorded in the client log. This is because the client is the initiator of the TCP connection and calls connect(). If connect() fails, its ECONNRESET return code can still be notified to the application.

Let’s take a look at this schematic diagram of the relationship between system calls and TCP status:

Therefore, although the above is also an RST, it is not the kind of "RST that occurs after the connection is established" we are looking for.

Continue polishing the filter

It seems that we need to further refine the filtering conditions and exclude RST in the master stage. To do this, we must first figure out:What are the characteristics of RST in the handshake phase?

If we pay attention to the screenshot above, we will actually find that the serial number of this RST is 1, and the confirmation number is also 1. Therefore, we can add this condition after the original filter condition:

tcp.seq eq 1 and tcp.ack eq 1

So the filter conditions become:

ip.addr eq 10.255.252.31 and tcp.flags.reset eq 1 and !(tcp.seq eq 1 and tcp.ack eq 1)

Note that (tcp.seq eq 1 and tcp.ack eq 1) here is preceded by an exclamation point (the same applies to not), which plays the role of "inversion", that is, excluding such messages.

Let's take a look at what the filtered packets look like now:

We found many RST messages with sequence number 2. What are these? We select packet number 115, and then Follow -> TCP Stream to take a look:

It turns out that this is the RST of the waving stage, and the data interaction stage is not captured, so it has nothing to do with the error in the log and can be eliminated. In this case, we can change the "and" in the previous filter condition to "or" to exclude the RST messages in the handshake phase and the wave phase at the same time. We enter the filter:

ip.addr eq 10.255.252.31 and tcp.flags.reset eq 1 and !(tcp.seq eq 1 or tcp.ack eq 1)

Get the following messages:

Although the RST messages in the handshake phase have been excluded, there are still too many left. Where are the RSTs we are looking for that "cause Nginx log errors"?

In order to find them, some more explicit search criteria are needed. Remember the two big gaps mentioned? One is the gap between application phenomena and network phenomena, and the other is the gap between tool tips and protocol understanding.

Now in order to cross the first gap, we need to specify the search conditions. For the current case, we are looking for data packets based on the following conditions:

Since these network messages are directly related to application layer transactions, the messages should contain request-related data, such as strings, values, etc. Of course, this premise is that the data itself has not been specifically encoded. Otherwise, the binary data in the message will be completely different from the data seen after decoding by the application layer.

Supplement: The most typical scenario for encoding is TLS. If we do not decrypt, then the messages captured directly by tcpdump or Wireshark are encrypted, which is completely different from the data in the application layer (such as HTTP), which also brings considerable difficulties to the troubleshooting work. How to decrypt TLS packet capture data will be mentioned in the TLS troubleshooting course of "Practical Combat 2".

The sending time of these messages should be consistent with the log time.

For condition 1, we can use the URL and other information in the Nginx log; for condition 2, we have to use the time of the log. In fact, in the Nginx log shown at the beginning, there is a clear time (2015/12/01 15:49:48). Although it is only accurate to seconds, it is often enough to help us further narrow the scope.

So, how to search for "messages within a specific time period" in Wireshark? This is anothersearch trick I want to introduce: using the frame.time filter. For example:

frame.time >="dec 01, 2015 15:49:48" and frame.time <="dec 01, 2015 15:49:49"

This can help us locate the packet that matches the time of the first log in the Nginx log above. For easier understanding, copy this log directly here for reference:

2015/12/01 15:49:48 [info] 20521#0: *55077498 recv() failed (104: Connection reset by peer) while sending to client, client: 10.255.252.31, server: manager.example.com, request: "POST /WebPageAlipay/weixin/notify_url.htm HTTP/1.1", upstream: "http:/10.4.36.207:8080/WebPageAlipay/weixin/notify_url.htm", host: "manager.example.com"

Combined with the previous search conditions, the following more precise filter conditions are obtained:

frame.time >="dec 01, 2015 15:49:48" and frame.time <="dec 01, 2015 15:49:49" and ip.addr eq 10.255.252.31 and tcp.flags.reset eq 1 and !(tcp.seq eq 1 or tcp.ack eq 1)

What a long filter! But it doesn’t matter. People may find it long after reading it, but Wireshark may not think so. It may still be pleasing to the eye. It's like machine language. When people read it, it feels like a sacred book, but the machine feels so close to it, "This is my mother tongue!"

Well, this time we finally successfully locked down to only 3 RST messages:

The next thing to do will be much simpler: just compare the application layer data (that is, HTTP requests and returns) in the TCP stream where these three RSTs are located with the requests and returns in the Nginx log, and you can find out which one it is. RST caused Nginx to report an error.

In-depth analysis of problem messages

Let’s take a look first, what is the flow to which message No. 11393 belongs?

Then take a look at the TCP flow to which message No. 11448 belongs.

It turns out that 11448 and 11450 are in the same stream. Now it is clear that 3 RSTs belong to 2 HTTP transactions respectively.

Carefully compare the red boxes in the two pictures. Are they different? They respectively correspond to a request with the string "weixin" in the URL, and a request with the string "app" in the URL. So, which URL is the corresponding log at this time point (15:49:48) about?

2015/12/01 15:49:48 [info] 20521#0: *55077498 recv() failed (104: Connection reset by peer) while sending to client, client: 10.255.252.31, server: manager.example.com, request: "POST /WebPageAlipay/weixin/notify_url.htm HTTP/1.1", upstream: "http:/10.4.36.207:8080/WebPageAlipay/weixin/notify_url.htm", host: "manager.example.com"

As long as you drag the mouse to the right, you can see the "weixin" string in the POST URL. The requests for the TCP stream where the two RSTs of packet numbers 11448 and 11450 are located also carry the "weixin" string, so they are the RSTs matching the above log!

If you still don’t fully understand it, here is a summary of why we can be sure that this TCP stream corresponds to this log. There are three main reasons:

The time matches;

RST behavioral fit;

The URL paths match.

By interpreting the above TCP flow, we finally crossed the gap between "application phenomena and network messages":

Going a step further, draw the overall process of this HTTP transaction to further understand why this RST will cause Nginx to record a connection reset by peer error:

In other words, the handshake and HTTP POST request and response are normal, but after the client ACKs the HTTP 200 response, it immediately sends RST+ACK, and this is exactly the behavior< a i=1>Broken normal TCP four waves. It was this RST that caused the recv() call of Nginx on the server side to receive an ECONNRESET error, which entered the Nginx log and became a connection reset by peer.

What impact does this have on the application? For the server, on the surface, at least one error log is recorded. But what is interesting is that this POST is still successful and has been processed normally, otherwise Nginx would not reply HTTP 200.

What about the client? It's hard to say, because we don't have the client's logs, and we can't rule out that the client thinks this is a failure and may retry, etc.

After telling the customers this conclusion, they were relieved a little: at least the POST data was processed by the server. Of course, they still need to find problems with the client code and fix this abnormal RST behavior, but at least they no longer have to worry about whether the data is complete and the transaction is normal.

Now, back to the three questions we started with:

Will this reset affect our business? Was this transaction successful?

At what specific stage does this reset occur? Is it a normal TCP disconnection?

What can we do to avoid this reset?

We can answer it now:

Whether this reset affects the business still needs to be checked on the client application, but the server-side transaction was successfully processed.

This reset occurs after the transaction is completed, but it is not a normal TCP disconnection. You need to continue to check the client code problem.

To avoid this reset, client code needs to fix it.

Supplement: It is not appropriate for the client to use RST to disconnect. The reason needs to be found in the code. For example, the client calls close() when there is still data in the Receive Buffer that has not been read. The impact on the application depends on the specific application logic.

There are many links in the network, including client, server, intermediate routing and switching equipment, firewall, LB or reverse proxy, etc. How to locate specific problem nodes among so many links has always been a pain point for many engineers. For example, an unstable network or several RSTs from the firewall may also cause similar connection reset by peer problems.

Through packet capture analysis, we peeled off the cocoon and determined that the specific problem link was not in Nginx, nor in the network itself, but in the client code. Precisely because of this kind of analysis, students who write code can concentrate on code repairs without having to constantly suspect that the problem lies in other links.

Okay, after discussing RST, you may ask: TCP generally uses FIN to wave. This knowledge point has not been discussed yet. Don’t worry, this second case is about FIN.

Case 2: One FIN completes the TCP wave?

You should know that TCP waves "four times", which is almost a cliché. Let’s take a look at the regular four-wave wave process: 

In the figure, the names "client" and "server" are not used, but are called "initiator" and "receiver". This is because TCP’s waving can be actively initiated by either end. In other words, the right to initiate the wave is not fixed to the client or the server. This is different from the TCP handshake: the handshake is initiated by the client. Or to put it another way: It is the client that initiates the handshake. During the handshake phase, the division of roles is clear.

In addition, FIN and ACK each have two times, which is also very clear.

But once, a customer reported such a strange phenomenon to me: They accidentally discovered that their application had only one FIN during the TCP shutdown phase, instead of two FINs. This doesn't seem to be common sense. I also found it interesting, so I took a look at their packet capture file:

Strange indeed, there is really only one FIN. How can the operating systems on both ends tolerate this kind of thing? I instantly felt like my house was in ruins: Could it be that TCP, which has always been strict, could break up so casually? "It was you who wanted to break up in the first place. Just break up. One FIN is enough. Tears fall"?

Soon, I realized there was another possibility. When we introduced the TCP handshake in the previous lesson, we mentioned that a packet in TCP can piggyback on another packet to improve the efficiency of TCP transmission. Therefore, TCP waving does not necessarily require four packets. After piggybacking, it may be three packets. It looks like three waves:

In this case, we saw the last two messages in Wireshark, namely the FIN+ACK replied by the receiving end and the last ACK from the initiating end. So,where isthe first FIN? It's really hard to tell from the Wireshark screenshot.

Of course, from the Wireshark diagram, we can even think that this connection was initiated by the server. It sent FIN+ACK, but the client only responded with an ACK, and the connection ended. This interpretation is even weirder, but it is also consistent with Wireshark's display.

However, the main interface of Wireshark has another feature.That is, when its Information column displays application layer information, the TCP layer control information of this packet does not is shown. Therefore, the Information column of the above POST request message is the POST method plus the specific URL. Its TCP information, including sequence number, confirmation number, flag bit, etc., needs to be found in the details.

First select this POST message, and then go to the TCP details section in the middle of the interface to take a look:

It turns out that the first FIN control message did not appear alone as usual, but was merged (Piggybacking) into the POST message! Therefore, the entire waving process is actually still very standard and fully follows the protocol specifications. It was just a small misunderstanding caused by the display problem of Wireshark. Although there is still a question of "why there is no HTTP response message", the problem of TCP waving has been reasonably explained.

This also reminds us that when understanding TCP knowledge points, we need to truly understand them, rather than just copy them mechanically. On the one hand, this requires a careful study of the agreement. On the other hand, it is also inseparable from the accumulation and integration of actual cases, so that quantitative changes lead to qualitative changes.

We also need to have an attitude: most of the time, when we see something that seems to be "irregular behavior" in TCP, we'd better first reflect on whether our mastery of TCP is not deep enough, rather than doubting TCP first. After all, it has been tested for a long time, and its probability of being correct is much higher than ours. So if we do "self-examination", it is actually a good deal and basically a "guaranteed win".

summary

By reviewing the case, we sorted out the relevant technical details of TCP waving. In Case 1, the method of packet capture and analysis was used to bridge the two major gaps of "application symptoms and network phenomena" and "tool tips and protocol understanding". You can focus on the promotion techniques used here:

First, based on the appearance information of the application layer, the two filtering conditions of IP and RST packets are extracted, and the packet filtering work is started.

Analyze the filtering results of the first pass to obtain filtering conditions for further advancement (in this case, excluding RST in the handshake phase).

Combined with the log time range, we continue to narrow the range to 3 RST packets. This range is small enough that we can conduct analysis and finally find the TCP flow related to the error report. This "iterative" filtering can be repeated for several rounds until problem packets are located.

In this TCP flow, combined with the understanding of the TCP protocol and HTTP, the problem is located.

In addition, through this case, some Wireshark usage skills are also introduced, especially various filters:

via ip.addr eq my_ip or ip.src eq my_ip, or ip.dst eq my_ip, you can find the packets related to my_ip.

The RST message can be found through tcp.flags.reset eq 1, other TCP flag bits, and so on.

You can find the message with the confirmation number my_num throughtcp.ack eq my_num. You can search for the sequence number in the same way a>. tcp.seq eq my_num

Adding "!" or not before a filter expression has the opposite effect, that is, excluding these packets.

Through frame.time >="dec 01, 2015 15:49:48", you can filter packets based on time. You can use and or or between multiple filter conditions to form a composite filter.

By combining the information in the application log (such as URL path, etc.) and TCP in Wireshark Comparing the payload information can help us locate the network packets related to this log.

In Case 2, we have a new understanding of "four waves". Through this real case, I hope to understand:

In fact, TCP waving may not appear to be four packets, because of the existence of piggybacking, it may appear to be three times.

In some special cases, the first FIN is not visible in Wireshark. At this time, you should not really regard the later FIN directly displayed by Wireshark as the first FIN. You need to select the packets near the waving phase and check whether any packets carry the FIN flag in the TCP details. This is indeed a very easy place to fall into a trap, so be warned.

Expanded knowledge: Common misunderstandings about waving hands

I have also talked about two cases. I believe you have a deeper understanding of abnormal waving (RST) and normal waving (FIN). Next, I will introduce a few common misunderstandings, hoping to have the effect of "change them if they exist, and encourage them if they don't exist."

Is the connection closed initiated by the client?

In fact, that's not true. The connection closure can be initiated by the client or the server. The reason for this misunderstanding is actually related to this picture:

Did you notice that the first FIN in the picture is initiated from the client. But won’t the server actively initiate shutdown/waving? Of course it does, but it's not shown in the picture. Waving is different from shaking hands. The handshake must be initiated by the client (that's why it's called client), but waving can be done by both parties.

In fact, this picture was also mentioned in the previous class. It comes from "UNIX Network Programming: Socket Networking API" by Richard Stevens. Could it be that Stevens himself was mistaken? I think this possibility is several orders of magnitude lower than my probability of winning the lottery.

Of course Stevens knew that both sides could initiate a wave. He just wanted to highlight the key points and did not draw multiple situations into the same picture, because the key point of this picture is to put TCP connection status changes are clearly displayed instead of highlighting the details of "who can initiate a wave".

Can't the waves be initiated at the same time?

Some students think that waving is initiated by the client or the server. Anyway, it cannot be initiated by both parties at the same time. In fact, if both parties actively initiate shutdown at the same time, how will TCP handle this situation? Let’s look at the picture below:

After both parties initiated closure at the same time, they also entered the FIN_WAIT_1 state at the same time;

Then because they received the other party's FIN, they also entered the CLOSING state;

When both parties receive the other party's ACK, they eventually enter the TIME_WAIT state.

This also means that both ends need to wait 2MSL before reusing this five-tuple TCP connection. This situation is relatively rare, but protocol design needs to consider the implementation under various boundary conditions, which is a lot more things to consider than ordinary applications. So maybe some RFCs seem simple, but they are actually very complicated behind the scenes.

When TCP waves, both sides stop sending data at the same time?

If one party sends FIN, it means that the connection is beginning to close, and neither party will send new data? This is also a very common misunderstanding.

In fact, sending FIN by one party just means that this party will no longer send new data, but the other party can still send data.

Still in Richard Stevens' "TCP/IP Detailed Explanation (Volume 1)", it is clearly mentioned that TCP can have a "semi-closed" approach, that is:

One end (A) sends FIN, which means "I want to close and no longer send new data, but I can receive new data."

The other end (B) can reply with ACK, saying "I know your end will not send any more, but I may not."

B can continue to send new data to A, and A will reply with ACK to confirm receipt of the new data.

After sending the new data, B started its own shutdown process, that is, sending FIN to A, saying "I am finally busy, and I want to close it, and no more new data will be sent."

At this time, both ends actually close the connection.

I still moved Stevens’ pictures for reference, and paid tribute to Master Stevens again!

Guess you like

Origin blog.csdn.net/qq_37756660/article/details/133524795
Recommended