Find an open source project optimization point, click it and it's yours

Hello, everyone, I'm Xiaolou.

When I was bored (touched) and wandered (fish) github recently, I found a place where Ali's open source project can contribute code.

It is not the kind of writing single test and changing the code format, but it is more challenging. The 性能优化most important thing is that it is not difficult. After reading this article carefully, you can write it with a little basic knowledge.

I believe that when you write code to obtain timestamps every day, you will write the following code:

long ts = System.currentTimeMillis();
复制代码

There are also some Gophers among the readers, and we will write them again in Go:

UnixTimeUnitOffset = uint64(time.Millisecond / time.Nanosecond)
ts := uint64(time.Now().UnixNano()) / UnixTimeUnitOffset
复制代码

In general, there is no problem in writing it like this, or in 99% of the cases, but a big guy has studied the acquisition of timestamps in Java:

pzemtsov.github.io/2017/07/23/…

He came to a conclusion: the higher the concurrency, the slower the timestamp!

We don't know the details very well, probably because there is only one global clock source, high concurrency or frequent access will cause serious contention.

cache timestamp

My earliest exposure to optimizing with cached timestamps was in the Cobar project:

github.com/alibaba/cob…

Since Cobar is a database middleware, its QPS may be very high, so there is this optimization, let's take a look at his implementation:

  • Start a separate thread to get the timestamp every 20ms and cache it
  • Take the cache directly when using the timestamp

github.com/alibaba/cob…

/**
 * 弱精度的计时器,考虑性能不使用同步策略。
 * 
 * @author xianmao.hexm 2011-1-18 下午06:10:55
 */
public class TimeUtil {
    private static long CURRENT_TIME = System.currentTimeMillis();

    public static final long currentTimeMillis() {
        return CURRENT_TIME;
    }

    public static final void update() {
        CURRENT_TIME = System.currentTimeMillis();
    }
}
复制代码

github.com/alibaba/cob…

timer.schedule(updateTime(), 0L, TIME_UPDATE_PERIOD); // TIME_UPDATE_PERIOD 是 20ms
...
// 系统时间定时更新任务
private TimerTask updateTime() {
    return new TimerTask() {
        @Override
        public void run() {
            TimeUtil.update();
        }
    };
}
复制代码

The reason why Cobar does this is because his QPS is often very high, which can reduce the CPU consumption or time-consuming of obtaining timestamps; secondly, this timestamp is only used for statistics inside Cobar, even if it is inaccurate, it does not matter. , from the point of view of implementation, it is indeed 弱精度.

Later, I also saw similar implementations in other codes, such as Sentinel (not Redis's Sentinel, but Ali's open source current-limiting fuse, Sentinel).

Sentinel, as a tool for current limiting and fusing, naturally has its own overhead as small as possible, so Sentinel, which is also from Ali, also uses a similar implementation to Cobar: 缓存时间戳.

原因也很简单,尽可能减少对系统资源的消耗,获取时间戳的性能要更优秀,但又不能和Cobar那样搞个弱精度的时间戳,因为Sentinel获取到的时间戳很可能就决定了一次请求是否被限流、熔断。

所以解决办法也很简单,直接将缓存时间戳的间隔改成1毫秒

去年我还写过一篇文章《低开销获取时间戳》,里面有Sentinel这段代码:

甚至后来的Sentinel-Go也采取了一模一样的逻辑:

以前没有多想,认为这样并没有什么不妥。

直到前两天晚上,没事在Sentinel-Go社区中瞎逛,看到了一个issue,大受震撼:

github.com/alibaba/sen…

提出这位issue的大佬在第一段就给出了非常有见解的观点:

说的比较委婉,什么叫「负向收益」?

我又搜索了一下,找到了这个issue:

github.com/alibaba/Sen…

TimeUtil吃掉了50%的CPU,这就是「负向收益」,还是比较震惊的!

看到这个issue,我简单地想了下:

  • 耗时:获取时间戳在一般情况下耗时几乎都不会影响到系统,尤其是我们常写的业务系统
  • CPU:假设每毫秒缓存一次时间戳,抛开其他开销不说,每秒就有1000次获取时间戳的调用,如果每次请求中只有1次获取时间戳的操作,那么至少得有1000QPS的请求,才能填平缓存时间戳的开销,况且还有其他开销

但这只是我们的想当然,如果有数据支撑就又说服力了。为此前面提出「负向收益」的大佬做了一系列分析和测试,我们白嫖一下他的成果:

看完后我跪在原地,久久不能起身。

课代表来做个总结:

  • 缓存时间戳开销最大的地方是sleep和获取时间戳
  • 理论上来说单机QPS需要大于4800才会有正向收益,真实测试结果也是在4000QPS以内都没有正向收益
  • 如果不要这个缓存时间戳,获取时间戳耗时会增加,但这在可接受范围内
  • 鉴于常规情况下QPS很少会达到4K,所以最后结论是在Sentinel-Go中默认禁用这个特性

这一顿操作下来,连Sentinel社区的大佬也觉得很棒,竖起来大拇指:

然而做了这么多测试,最后的修改就只是把true改成false:

自适应算法

本来我以为看到这位大佬的测试已经是非常有收获了,没想到接下去的闲逛又让我发现了一个更了不得的东西。

既然上面分析出来,在QPS比较高的情况下,收益才能抵消被抵消,那么有没有可能实现一个自适应的算法,在QPS较低的时候直接从系统获取,QPS较高时,从缓存获取。

果不其然,Sentinel(Java版,版本>=1.8.2)已经实现了!

issue参考:github.com/alibaba/Sen…

我们捋一下它的实现:

我们首先看最核心的缓存时间戳的循环(每毫秒执行1次),在这个循环中,它将缓存时间戳分成了三个状态:

  • RUNNING:运行态,执行缓存时间戳策略,并统计写时间戳的QPS(把对缓存时间戳的读写QPS分开统计)
  • IDLE:空闲态(初始状态),什么都不做,只休眠300毫秒
  • PREPARE:准备态,缓存时间戳,但不统计QPS

这三个状态怎么流转呢?答案在开头调用的check方法中:

首先check逻辑有个间隔,也就是每隔一段时间(3秒)来做一次状态转换;

其次如果当前状态是空闲态并且读QPS大于HITS_UPPER_BOUNDARY(1200),则切换为准备态

如果当前状态是运行态且读QPS小于HITS_LOWER_BOUNDARY(800),则切换为空闲态

发现似乎少了切换到运行态的分支,看上面的循环中,第三个准备态的分支运行一次就将状态切换为运行态了。

这是为啥?其实准备态只是为了让程序从空闲态切换到运行态时过渡的更平滑,因为空闲态下缓存时间戳不再更新,如果没有过渡直接切换到运行态,那可能切换后获取的时间戳是有误差的。

文字可能不直观,我们画一个状态流转图:

最后这些准备好了,获取时需要做两件事:一是统计读时间戳的QPS,二是获取时间戳;如果是空闲态准备态则直接获取系统时间返回,如果是运行态则从缓存中拿时间戳。

当程序比较空闲时,不会缓存时间戳,降低CPU的消耗,QPS较高时缓存时间戳,也能降低CPU的消耗,并且能降低获取时间戳的时延,可谓是一举两得。

但这中间我有个疑问,这里QPS的高低边界不知道是如何得出的,是拍脑袋还是压测出来的,不过这个数值似乎并不一定绝对准确,可能和机器的配置也有关系,所以我倾向这个值可以配置,而不是在代码中写死,关于这点,这段代码的作者也解释了原因:

最后可能你会问,这QPS咋统计呀?

这可是Sentinel的强项,利用LeapArray统计,由于这不是本文重点,就不展开了,有兴趣可以参考我之前的文章《Sentinel-Go 源码系列(三)滑动时间窗口算法的工程实现》,虽然文章是Go的,但算法和Java的是一模一样,甚至实现都是照搬。

有没有测试数据支撑呢?有另一位大佬在评论区贴出了他的测试数据,我们看一下:

在低负载下,CPU消耗降低的特别明显,高负载则没什么变化,这也符合我们的预期。

看到这里你是不是觉得该点题了?没错,Sentinel-Go还没实现上述的自适应算法,这是个绝佳的机会,有技术含量,又有参考(Java版),是不是心动了?

社区中也有该issue:

github.com/alibaba/sen…

This issue was claimed by a buddy in August 2021, but no code has been contributed so far. Rounding up means he gave up, so you know what I mean?

One last word

If you think the article is okay, please use your little hands, click 关注, 在看, , your encouragement is the driving force for me to continue to create!

By the way, if you feel that it is not enough, you can read these related articles:

Thanks for reading, see you next time~


Search and follow the WeChat public account "bug master", back-end technology sharing, architecture design, performance optimization, source code reading, troubleshooting, and practice.

Guess you like

Origin juejin.im/post/7101520653442318366