Senseless! Simple HTTP calls, the delay could be so big?

Recent test project encountered a strange phenomenon, in a test environment by calling the backend Apache HTTP Client HTTP service, but still close to the average time 39.2ms.

 

Senseless!  Simple HTTP calls, the delay could be so big?

 

 

Pictures from Pexels

At first glance might think this is not normal for you, there is nothing strange? In fact, I would for some basic information.

The back-end HTTP service and there is no business logic, simply turn a string to uppercase and returns only the string length is only 100 characters, additional network Ping delay of only 1.9ms.

Therefore, in theory, it takes the call should be around 2-3ms, but why it took an average of 39.2ms?

 

Senseless!  Simple HTTP calls, the delay could be so big?

 

 

Call delay

Senseless!  Simple HTTP calls, the delay could be so big?

 

 

Ping Delay

Due to work, call a time-consuming problem, for me, has got used to, and often help solve internal business RPC framework calls a timeout related issues, but time-consuming HTTP call is first encountered.

However, troubleshooting routine is the same. Methodology The main nothing less from outside to inside, the top-down method and the like investigation. Let's look at some of the indicators of the periphery, see if you can find clues.

Peripheral index

System Specifications

Look at some of the main indicators of system peripheral (Note: Calls to the called machine must be seen). Such as load, CPU. Just a Top command can glance.

Therefore, it was confirmed under load and CPU are idle. Since there was no shot, here and hold a plan.

Process indicators

Java program process indicators mainly to see GC, thread stack situation (Note: Calls to the called machine must be seen).

Young GC are very small, but also less time-consuming 10ms, so there is no long STW.

Because the average call time of 39.2ms, relatively large, if the code is time-consuming cause, thread stack should be able to find Diansha.

After seeing nothing, mainly related thread stack service is the thread in the thread pool and other tasks, which means that the thread is not busy.

I am not feeling the end of their rope, then how to do it?

Local reproduction

If the local (local MAC system is) able to reproduce, to troubleshoot problems also excellent.

Therefore, the use of local Apache HTTP Client Test wrote a simple program, directly call the backend HTTP services, found that the average time of about 55ms.

Hey, how little difference with the results of the test environment 39.2ms. Mainly local and back-end test environment of the HTTP server machine trans-regional, Ping delay of about 26ms, so the delay is increased.

But indeed, there is a local problem, because Ping latency is 26ms, HTTP back-end service logic is simple, almost no time-consuming, so the local call average time should be around 26ms, Why is 55ms?

Is not getting confused, confused, I do not know how to start? Doubted during the Apache HTTP Client is not used there is something wrong somewhere.

Therefore, use JDK comes HttpURLConnection wrote a simple program, do the test, the same results.

diagnosis

Locate

In fact, from the periphery of the system indicators, process indicators, as well as local reproduction of view, roughly reasons are not able to conclude the program. TCP protocol level that it?

Have experience in network programming students must know what TCP parameters can cause this phenomenon. Yes, you guessed right, is TCP_NODELAY.

And that the caller and callee which side of the program is not set yet? Caller using Apache Http Client, tcpNoDelay default setting is True.

Let us look at the called party, that is our backend HTTP services, the HTTP service using a JDK comes HttpServer:

HttpServer server = HttpServer.create(new InetSocketAddress(config.getPort()), BACKLOGS); 

Did not actually see directly tcpNoDelay interfaces, turned down the source. Oh, here.

This block has a static, to obtain start-up parameters, defaults to false ServerConfig ServerConfig.noDelay class:

static { 
AccessController.doPrivileged(new PrivilegedAction<Void>() {
public Void run() {
ServerConfig.idleInterval = Long.getLong("sun.net.httpserver.idleInterval", 30L) * 1000L;
ServerConfig.clockTick = Integer.getInteger("sun.net.httpserver.clockTick", 10000);
ServerConfig.maxIdleConnections = Integer.getInteger("sun.net.httpserver.maxIdleConnections", 200);
ServerConfig.drainAmount = Long.getLong("sun.net.httpserver.drainAmount", 65536L);
ServerConfig.maxReqHeaders = Integer.getInteger("sun.net.httpserver.maxReqHeaders", 200);
ServerConfig.maxReqTime = Long.getLong("sun.net.httpserver.maxReqTime", -1L);
ServerConfig.maxRspTime = Long.getLong("sun.net.httpserver.maxRspTime", -1L);
ServerConfig.timerMillis = Long.getLong("sun.net.httpserver.timerMillis", 1000L);
ServerConfig.debug = Boolean.getBoolean("sun.net.httpserver.debug");
ServerConfig.noDelay = Boolean.getBoolean("sun.net.httpserver.nodelay");
return null;
}
});
}

verification

In HTTP back-end services, plus start "-Dsun.net.httpserver.nodelay = true" parameter try again.

The effect is obvious, the average time dropped from 39.2ms 2.8ms:

 

Senseless!  Simple HTTP calls, the delay could be so big?

 

 

After the call to delay optimization

The problem is solved, but here, if you stop there, it would be too expensive this case, is simply a throwaway.

Because there are a bunch of doubts waiting How about you:

  • Why add a TCP_NODELAY, latency is reduced from 39.2ms to 2.8ms?
  • Why is the average delay local test is 55ms, rather than delay Ping of 26ms?
  • TCP protocol exactly how to send the packet?

, We then build on the progress.

Doubts

①TCP_NODELAY Who are they?

In Socket Programming, TCP_NODELAY option is used to control whether to turn on the Nagle algorithm.

In Java, the Nagle algorithm is Ture means closed, open the Nagle algorithm False representation. You will ask what Nagle algorithm?

②Nagle algorithm is what the hell?

Nagle algorithm by reducing a number of data packets transmitted over a network TCP IP network efficiency approach / be improved.

它使用发明人 John Nagle 的名字来命名的,John Nagle 在 1984 年首次用这个算法来尝试解决福特汽车公司的网络拥塞问题。

试想如果应用程序每次产生 1 个字节的数据,然后这 1 个字节数据又以网络数据包的形式发送到远端服务器,那么就很容易导致网络由于太多的数据包而过载。

在这种典型情况下,传送一个只拥有 1 个字节有效数据的数据包,却要花费 40 个字节长包头(即 IP 头部 20 字节+TCP 头部 20 字节)的额外开销,这种有效载荷(payload)的利用率是极其低下。

Nagle 算法的内容比较简单,以下是伪代码:

if there is new data to send 
if the window size >= MSS and available data is >= MSS
send complete MSS segment now
else
if there is unconfirmed data still in the pipe
enqueue data in the buffer until an acknowledge is received
else
send data immediately
end if
end if
end if

具体的做法就是:

  • 如果发送内容大于等于 1 个 MSS,立即发送。
  • 如果之前没有包未被 ACK,立即发送。
  • 如果之前有包未被 ACK,缓存发送内容。
  • 如果收到 ACK,立即发送缓存的内容。(MSS 为 TCP 数据包每次能够传输的最大数据分段)

③Delayed ACK 又是什么玩意?

大家都知道 TCP 协议为了保证传输的可靠性,规定在接受到数据包时需要向对方发送一个确认。

只是单纯的发送一个确认,代价会比较高(IP 头部 20 字节+TCP 头部 20 字节)。

TCP Delayed ACK(延迟确认)就是为了努力改善网络性能,来解决这个问题的。

它将几个 ACK 响应组合合在一起成为单个响应,或者将 ACK 响应与响应数据一起发送给对方,从而减少协议开销。

具体的做法是:

  • 当有响应数据要发送时,ACK 会随响应数据立即发送给对方。
  • 如果没有响应数据,ACK 将会延迟发送,以等待看是否有响应数据可以一起发送。在 Linux 系统中,默认这个延迟时间是 40ms。
  • 如果在等待发送 ACK 期间,对方的第二个数据包又到达了,这时要立即发送 ACK。

但是如果对方的三个数据包相继到达,第三个数据段到达时是否立即发送 ACK,则取决于以上两条。

④Nagle 与 Delayed ACK 一起会发生什么化学反应?

Nagle 与 Delayed ACK 都能提高网络传输的效率,但在一起会好心办坏事。

例如,以下这个场景,A 和 B 进行数据传输 : A 运行 Nagle 算法,B 运行 Delayed ACK 算法。

如果 A 向 B 发一个数据包,B 由于 Delayed ACK 不会立即响应。而 A 使用 Nagle 算法,A 就会一直等 B 的 ACK,ACK 不来一直不发送第二个数据包,如果这两个数据包是应对同一个请求,那这个请求就会被耽误了 40ms。

⑤抓个包玩玩吧

我们来抓个包验证下吧,在后端 HTTP 服务上执行以下脚本,就可以轻松完成抓包过程。

sudo tcpdump -i eth0 tcp and host 10.48.159.165 -s 0 -w traffic.pcap 

如下图,这是使用 Wireshark 分析包内容的展示,红框内是一个完整的 POST 请求处理过程。

 

Senseless!  Simple HTTP calls, the delay could be so big?

 

 

测试环境数据包分析

看 130 序号和 149 序号之间相差 40ms(0.1859 - 0.1448 = 0.0411s = 41ms),这个就是 Nagle 与 Delayed ACK 一起发送的化学反应。

其中 10.48.159.165 运行的是 Delayed ACK,10.22.29.180 运行的是 Nagle 算法。

10.22.29.180 在等 ACK,而 10.48.159.165 触发了 Delayed ACK,这样傻傻等了 40ms。

这也就解释了为什么测试环境耗时是 39.2ms,因为大部分都被 Delayed ACK 的 40ms 给耽误了。

但是本地复现时,为什么本地测试的平均时延是 55ms,而不是 Ping 的时延 26ms?我们也来抓个包吧。

如下图,红框内是一个完整的 POST 请求处理过程,看 8 序号和 9 序号之间相差 25ms 左右,再减去网络延时约是 Ping 延时的一半 13ms。

 

Senseless!  Simple HTTP calls, the delay could be so big?

 

 

本地环境数据包分析

因此 Delayed Ack 约 12ms 左右(由于本地是 MAC 系统与 Linux 有些差异)。

Linux 使用的是 /proc/sys/net/ipv4/tcp_delack_min 这个系统配置来控制 Delayed ACK 的时间,Linux 默认是 40ms; 
2. MAC 是通过 net.inet.tcp.delayed_ack 系统配置来控制 Delayed ACK 的。
delayed_ack=0 responds after every packet (OFF)
delayed_ack=1 always employs delayed ack, 6 packets can get 1 ack
delayed_ack=2 immediate ack after 2nd packet, 2 packets per ack (Compatibility Mode)
delayed_ack=3 should auto detect when to employ delayed ack, 4packets per ack. (DEFAULT)
设置为 0 表示禁止延迟 ACK,设置为 1 表示总是延迟 ACK,设置为 2 表示每两个数据包回复一个 ACK,设置为 3 表示系统自动探测回复 ACK 的时机。

⑥为什么 TCP_NODELAY 能够解决问题?

tcpNoDelay 关闭了 Nagle 算法,即使上个数据包的 ACK 没有到达,也会发送下个数据包,进而打破 Delayed ACK 造成的影响。

一般在网络编程中,强烈建议开启 tcpNoDelay,来提升响应速度。

当然也可以通过 Delayed ACK 相关系统的配置来解决问题,但由于需要修改机器配置,很不方便,因此,这种方式不太推荐。

总结

This article is from a simple HTTP calls, the delay caused by a relatively large troubleshooting process. Analyzes the issues related to the outside-in, and then locate the problem and verification solutions.

Finally, get to the bottom of the Nagle TCP transport and Delayed ACK to do a comprehensive explanation, a more thorough analysis of the problem cases.

Guess you like

Origin www.cnblogs.com/cangqinglang/p/11683246.html