什么是TCP keepalive

keep tcp alive,从字面上看,它能够检测你的 TCP socket  并检测连接是否在运行或者是否已经被破坏。

keepalive概念很简单:当建立一个TCP连接时,你将一系列的定时器与该连接相关联。这些定时器中某些用于处理keepalive过程。当keepalive定时器变为0时,你给你的同伴(也就是对方)发送一个keepalive 探针包(probe packet),包内没有数据并且ACK标识打开。另一方面,你会收到一个来自远方主机的回应,该回应没有数据并且设置ACK标识。

 如果你收到一个对于你的keepalive探针的响应,那么就说明连接正在运行,不必担心用户级别的实现。事实上,TCP允许你控制流,没有包及零长度的数据包对于用户程序而言没有危险。

这个过程是有用的。因为如果其他主机失去连接,你就可以注意到连接时断开的。如果keepalive探针没有被响应,那么就可以断言连接不能被认为是有效的,那么就需要采取正确的操作。

整个keepalive过程很简单,就是client给server发送一个包,server返回给用户一个包。注意包内没有数据,只有ACK标识 被打开。这就有点像小情侣聊QQ,常常是小女生向小男生发送一个窗口抖动,男生再给小女生发送一个窗口抖动,这就说明对方在呢,那就继续下面的聊天。如果 男生不发送,那么聊天中止。

为什么使用TCP keepalive

没有keepalive依旧生活很美好,不过既然你在读这篇文章,那就说明你在尝试keepalive是否能解决你现在的问题。

keepalive是无攻击性的,不过它会产生多余的网络带宽,这就会对于路由和防火墙产生影响。

接下来,我们会区别keepalive的两项任务:

  • checking for dead peers
  • Preventing disconnection due to network inactivity

检测dead peers

Keepalive can be used to advise you when your peer dies before it is able to notify you. This could happen for several reasons, like kernel panic or a brutal termination of the process handling that peer. Another scenario that illustrates when you need keepalive to detect peer death is when the peer is still alive but the network channel between it and you has gone down. In this scenario, if the network doesn't become operational again, you have the equivalent of peer death. This is one of those situations where normal TCP operations aren't useful to check the connection status.

Think of a simple TCP connection between Peer A and Peer B: there is the initial three-way handshake, with one SYN segment from A to B, the SYN/ACK back from B to A, and the final ACK from A to B. At this time, we're in a stable status: connection is established, and now we would normally wait for someone to send data over the channel. And here comes the problem: unplug the power supply from B and instantaneously it will go down, without sending anything over the network to notify A that the connection is going to be broken. A, from its side, is ready to receive data, and has no idea that B has crashed. Now restore the power supply to B and wait for the system to restart. A and B are now back again, but while A knows about a connection still active with B, B has no idea. The situation resolves itself when A tries to send data to B over the dead connection, and B replies with an RST packet, causing A to finally to close the connection.

Keepalive can tell you when another peer becomes unreachable without the risk of false-positives. In fact, if the problem is in the network between two peers, the keepalive action is to wait some time and then retry, sending the keepalive packet before marking the connection as broken.

暂不翻译,先看看。

01 _____                                                     _____
02 |     |                                                   |     |
03 |  A  |                                                   |  B  |
04 |_____|                                                   |_____|
05    ^                                                         ^
06    |--->--->--->-------------- SYN -------------->--->--->---|
07    |---<---<---<------------ SYN/ACK ------------<---<---<---|
08    |--->--->--->-------------- ACK -------------->--->--->---|
09    |                                                         |
10    |                                       system crash ---> X
11    |
12    |                                     system restart ---> ^
13    |                                                         |
14    |--->--->--->-------------- PSH -------------->--->--->---|
15    |---<---<---<-------------- RST --------------<---<---<---|
16    |                                                         |

阻止因网络连接不活跃

阻止因网络连接不活跃(长时间没有数据包)而导致的连接中断,说的是,很多网络设备,尤其是NAT路由器,由于其硬件的限制(例如内存、CPU处理 能力),无法保持其上的所有连接,因此在必要的时候,会在连接池中选择一些不活跃的连接踢掉。典型做法是LRU,把最久没有数据的连接给T掉。通过使用 TCP的KeepAlive机制(修改那个time参数),可以让连接每隔一小段时间就产生一些ack包,以降低被T掉的风险,当然,这样的代价是额外的 网络和CPU负担。

01 _____           _____                                     _____
02 |     |         |     |                                   |     |
03 |  A  |         | NAT |                                   |  B  |
04 |_____|         |_____|                                   |_____|
05    ^               ^                                         ^
06    |--->--->--->---|----------- SYN ------------->--->--->---|
07    |---<---<---<---|--------- SYN/ACK -----------<---<---<---|
08    |--->--->--->---|----------- ACK ------------->--->--->---|
09    |               |                                         |
10    |               | <--- connection deleted from table      |
11    |               |                                         |
12    |--->- PSH ->---| <--- invalid connection                 |
13    |               |                                         |

猜你喜欢

转载自just2do.iteye.com/blog/2186266