Detailed explanation of linux timer time wheel algorithm

Related video analysis:

Linux High Concurrency Programming | Red-Black Tree Realization Timer | Time Wheel Realization Timer
Timer Design for Massive Timing Tasks in Linux Multithreaded Environment

Time wheel implementation

Linux timers are divided into low-precision timers and high-precision timers, both of which are implemented by the kernel. This article discusses low-precision timers that are common in our application development. As a common basic component, several commonly used implementation methods for timers include: based on a sorted linked list, based on a small root heap, based on a red-black tree, and based on a time wheel. This article explains the optimal time complexity, which is also the implementation method based on the time wheel adopted by the linux kernel.

For timers implemented with an ordered linked list, the time complexity of adding a timer is O(n); for timers implemented using a small root heap or red-black tree, the time complexity of adding a timer is O(lgn). The reason why it is impossible to achieve O(1) complexity is that all timer nodes are hung on a linked list (or a tree). The core idea of ​​the time wheel algorithm is to hash the timer onto multiple chains, which is a typical space-for-time strategy. The following explanation starts from a single time wheel, and gradually expands to the multi-level time wheel algorithm used by linux to realize the timer.

Simple single time wheel

A single time wheel has only one wheel connected by buckets. The time wheel shown in the figure below has 8 buckets, and each bucket is linked to a node that expires at a corresponding time in the future. Assuming that the interval between the expiration time of adjacent buckets in the figure is slot=1s, starting from the current time 0s, the timer node that expires at 1s hangs under bucket[1], and the timer node that expires at 2s hangs on the bucket [2] Next... When tick checks that the time has passed 1s, all nodes under bucket[1] will perform timeout actions. When the time reaches 2s, all nodes under bucket[2] will perform timeout actions....
Insert picture description here

Since the bucket is an array, it can directly locate the specific timer node chain according to the subscript, so the time complexity of adding and deleting nodes and timer expiration execution is O(1).

But the limitation of using this timer is also obvious: the expiration time of the timer to be added must be within 8s. This obviously cannot meet actual needs. Of course, it is easy to expand, just increase the number of buckets. In the Linux system, we can set the slot to 1 jiffy (1/HZ) timer. Assuming that the maximum expiration time range is 2^32 jiffies, if we use the above single time wheel, we need 2 ^32 buckets, which will bring huge memory consumption, obviously needs to be optimized and improved.

[Article benefits] C/C++ Linux server architect learning materials plus group 812855908 (data including C/C++, Linux, golang technology, Nginx, ZeroMQ, MySQL, Redis, fastdfs, MongoDB, ZK, streaming media, CDN, P2P, K8S, Docker, TCP/IP, coroutine, DPDK, ffmpeg, etc.)
Insert picture description here

Improved single time wheel

The improved single time wheel is actually a compromise between time and space, that is, it will not have the time complexity of O(1) like a single time wheel, but it will not have a huge number of buckets like a single time wheel. Demand. The principle is also very simple, that is, each bucket can not only be connected to a timer with expire=slot, but also a timer with expire%N=slot (N is the number of buckets). This also happens to conform to the cycle of time. As shown in Figure 2, expire in the timer indicates the expiration time, and rotation indicates that the node expires after a few rounds of time. When the current time pointer points to a bucket, you cannot directly perform timeout actions on all nodes under the bucket like a simple time wheel. Instead, you need to traverse the nodes in the linked list to determine whether the number of wheel rotations is equal to the rotation value in the node. When the two are equal, the timeout operation can be performed.
Insert picture description here

Multiple time rounds

The time wheel mentioned above is a single-handed battle, so it is difficult to achieve the desired effect in time and space. The multi-time wheel algorithm implemented by Linux draws on the measurement method of water meters in daily life. The method of driving the wheel with the higher scale to move the wheel with the lower scale, which can represent a large range with only a few scales. The effect of the measure.
Insert picture description here

The Linux timer time wheel is divided into 5 levels of wheels (tv1 ~ tv5), as shown in Figure 3. The scale value (slot) of the wheels of each level is different, and the rule is that the slot of the secondary wheel is equal to the sum of the slots of the superior wheel. Linux timer slot unit is 1jiffy, tv1 wheel is divided into 256 ticks, each ticks size is 1jiffy. The tv2 wheel is divided into 64 scales, and each scale is 256 jiffy, which is the range that the entire wheel of tv1 can express. Only when the adjacent wheels meet this rule can the effect of "the low-scale wheel rotate one circle and the high-scale wheel move one square". tv3, tv4, tv5 are also divided into 64 scales, so it is easy to calculate, the slot range that can be expressed by the highest level wheel tv5 reaches 25664646464 = 2^32 jiffies.

The key to the Linux time wheel timer algorithm is to add timer operations and time wheel carry migration linked list operations. Let's talk about adding a timer first. The key to adding a timer is to know the range of expiration time that each tick of each time wheel can represent. Figure 4 lists the size of jiffies that can be measured in each level of time wheel. Assuming a timer expires after 1000 jiffies, it is easy to see from Figure 4 that it should be hung on the tv2 round. The size of each tick in the tv2 round is 256 jiffies, so they should be hung on (1000/256)=3, which is the third bucket.

Linux's operation on the timer expiration check is also very clever. Assuming curr_time=0x12345678, then the next check time is 0x12345679. If tv1.bucket[0x79] is not empty, the timer node on tv1.bucket[0x79] at the next check time expires. If curr_time reaches 0x12345700 and the lower 8 bits are empty, it means that a carry is generated. At this time, the timer linked list corresponding to 8 to 13 bits is removed (that is, it corresponds to the tv2 round), and the timer system is rejoined, which completes a carry migration operating. Similarly, when the 8th-13th bits of curr_time are 0, this indicates that the tv2 round has a carry to the tv3 round. The value of the 14-19th bits of curr_time is used as the subscript, and the corresponding timer linked list in tv3 is removed, and then the They rejoin the timer system. tv4, tv5 and so on. The reason why the timeout chain can be checked according to curr_time is because the measurement range of tv1~tv5 rounds covers the 32 bits of the integer in turn: tv1 (1-8 bits), tv2 (9-14 bits), tv3 (15-20) Bit), tv4 (21-26 bit), tv5 (27-32 bit); and in the increment of the curr_time count, the carry from low to high is the process in which the low-level time wheel turns in a circle to drive the high-level time wheel to move.

Compared

Finally, compare the time complexity and space complexity of a multi-level time wheel and a single simple time wheel: Linux uses a total of 256+64+64+64+64=512 buckets to achieve [0, 2^32) jiffies The timeout range. Compared with a simple single time round, the time is only 1/256 times longer (which is approximately equal to the value, ignoring the carry operation generated above tv2). It can be considered that the operation time complexity of adding, deleting timer nodes and expiring check are all O(1).

Guess you like

Origin blog.csdn.net/qq_40989769/article/details/112234070
Recommended