Network Programming TCP half-open connections and learning TIME_WAIT

https://blog.csdn.net/chrisnotfound/article/details/80112736

    The above link is to explain to  SO_KEEPALIVE option reasons why the need to develop the heartbeat protocol application layer include distributed systems development must also design their own application layer protocol developed heartbeat   

Be familiar with linux-based high-performance TCP protocol, highly concurrent server-side programming friends certainly should know the impact of each file descriptors share of its resources to the amount of concurrency.

On this 7 * 24 * 365 server even uninterrupted operation, a descriptor is wasted, wasted ... If two wasted more, it is also what is high concurrency, high performance. Remove files describe the situation is normal occupation, but what caused the file descriptors available to us less and less of it?

 

 https://blog.csdn.net/gettogetto/article/details/76736371

 

What is the half-open connections?

When the client and server establish normal TCP connection , if the client host dropped (cable disconnected), power failure or system crash, the server process will never know (through our usual select, epoll not monitored disconnect or error event), then if you do not take the initiative process or reboot the system for the server, it will always maintain this connection, despite how the server processes for ages, and never less than wait to respond to any client. This case is half-open connections, wasted file descriptors available to the server side.

How to deal with?

Familiar with common socket options friend must have thought. There is not a TCP socket keep-alive option SO_KEEPALIVE Well, if no data exchange in either direction inner two hours of the socket, TCP automatically sends a pair to keep alive detection section, if this TCP in response to the detection of the RST section, the end of the description has crashed and has been restarted, the error to be treated is set to ECONNRESET socket, the socket is closed as itself. If you do not detect any response to this TCP section of the socket processing error was set to ETIMEOUT, the socket itself were closed.

Indeed , this option can really deal with the problem we encountered earlier TCP half-open connections, but the default real-time detection interval of two hours is not a bit worse then?

Of course, we can modify kernel parameters piecemeal time interval, perfect, right?

But it must be noted that most of the kernel is based on the entire core to maintain these time parameters, rather than on a per-socket maintenance, so if the inactivity period was changed from two hours (for example) two minutes, it will affect All the hosts opened the socket this option.

I think we are not willing to take on this server uncertainty of it. In addition, the heartbeat addition to describing the application alive (process exists, the network flow), is more important is that the application to work properly. The SO_KEEPALIVE by the operating system is responsible for exploration, even if there is a deadlock or other abnormal process, the operating system will also send and receive TCP keepalive message is normal, but the other party can not know this anomaly .

It does not matter, in fact, we can simulate SO_KEEPALIVE way in the application layer, using heartbeat packets to emulate keep alive detection section .

Since the server generally of tens of thousands of concurrent connections, so it must be performed by the client application heartbeat layer simulation may terminate the TCP connection keepalive section, multiple client can not receive the response server, and the server monitoring heartbeat packets may be the client, if any heartbeat packet is not received from the client in a certain time interval may terminate the TCP connection, thus effectively avoid the situation half-open TCP connections.

 

 

 

 

 

 

When high concurrent TCP server development is performed, some rules like a convention, many of my friends will do it according to the rules, such as high concurrent TCP server actively shut down the party best client , server-side program best option is enabled SO_REUSEADDR but many people do not know why, why should we do it?


First on the map

203639589.jpg

You can see the active close and end the state of all phases of the end of the passive close, today we focus is TIME_WAIT state, it can be seen TIME_WAIT state is active close that end produced.

TIME_WAIT state for two reasons exist

  1. Full duplex reliably terminate the TCP connection;

  2. Allows repeated old section elapsed in a network;

The first reason reference on the map. Active assume the closed end finally transmitted ACK is lost. Peer will resend FIN, take the initiative to close the end only can resend the final ACK in the case of maintaining status information. If you do not maintain this status information, take the initiative to close the end will respond to a RST, to this end will be marked as an error response, it can not be properly closed.

The second reason to assume that we ip A: establish a TCP connection between the ports D Host: host and port B ip C. We close the connection, over a period of time to establish another connection between the same IP address and port. Because they have the same IP address and port number, so if the old duplicate packets for a connection again will affect new connections. To do this, TCP connection will not be in the TIME_WAIT state to launch this new connection. If this duration is greater than the MSL (IP datagram maximum survival time in the Internet)

To meet the above implementation, TIME_WAIT state must have a certain duration, the TIME_WAIT state is also called 2MSL wait, typically a duration between 1 minute to 4 minutes.

Party actively closing high concurrent TCP server is the best client : Because for highly concurrent server file descriptor for resources is an important resource, if TIME_WAIT go through this long 2MSL for each connection, it will inevitably result in resources You can not immediately reuse waste. Although for the client TIME_WAIT state will occupy ports and handle resources, but clients rarely have concurrent resource constraints, so the client does the active close is more appropriate.

Server program to enable the best SO_REUSEADDR options : we want to do in such a case, if the production environment, the server program shut down operations due to some mistake, we certainly want to immediately restart the service program, but still occupy the TIME_WAIT port addresses resources for your service to get up, then you do not worry with. SO REUSEADDR this option it is allowed to repeat the bind address for the port.

Guess you like

Origin www.cnblogs.com/zhangkele/p/9494599.html