Linux time subsystem (1)

A brief introduction to the time subsystem under linux. Including clocksource, timekeeper and timer content.

1 Introduction

The time subsystem is an integral part of the operating system. The function of Linux's time subsystem consists of two parts, namely saving the current time and maintaining the timer. As shown in the figure below, in the general time frame of the Linux kernel, timekeeper is used to maintain the current time, and tick_device or hrtimer is used to handle the timer function. Later we will discuss the implementation and use of the timekeeper part and the timer part in the kernel.

2、timekeeper

2.1 clocksource

At the hardware level, the clock source is usually a counter driven at a fixed clock frequency, and the counter can only increase monotonically until it overflows. In the Linux kernel, the clocksource structure is used to complete the encapsulation of the clock source. In the clocksource structure, information related to the clock source is recorded, including frequency, precision, and a callback function whose return value is cycle_t type. Through the callback function, we can get the count value in the clock source hardware at a certain moment. By analyzing the structure of the clocksource, understand the working principle of the clocksource.

2.1.1 struct clocksource

struct clocksource {

* Hotpath data, fits in a single cache line when the

* clocksource itself is cacheline aligned.

cycle_t (*read)(struct clocksource *cs);

cycle_t cycle_last;

cycle_t mask;

u32 much;

u32 shift;

u64 max_idle_ns;

#ifdef CONFIG_IA64

void *fsys_mmio; /* used by fsyscall asm code */

#define CLKSRC_FSYS_MMIO_SET(mmio, addr) ((mmio) = (addr))

#else

#define CLKSRC_FSYS_MMIO_SET(mmio, addr) do { } while (0)

#endif

const char *name;

struct list_head list;

int rating;

cycle_t (*vread)(void);

int (*enable)(struct clocksource *cs);

void (*disable)(struct clocksource *cs);

unsigned long flags;

void (*suspend)(struct clocksource *cs);

void (*resume)(struct clocksource *cs);

#ifdef CONFIG_CLOCKSOURCE_WATCHDOG

/* Watchdog related data, used by the framework */

struct list_head wd_list;

cycle_t cs_last;

cycle_t wd_last;

#endif

} ____cacheline_aligned;

只需要关注clocksource的几个重要字段，就能初步了解clocksource的工作原理。

2.1.1.1 rating

rating，表示时钟源的精度。在同一台设备中，可以有多个时钟源，时钟源的精度与其时钟频率有关，频率越高，时钟源的rating值越大。rating值代表着该时钟源的精度范围，取值范围如下：

* 0-99：不适合用作实际的时钟源，只用于启动过程或用于测试。

* 100-199：基本可用，可用作真实的时钟源，但不推荐。

* 200-299：精度较好，可用做真实的时钟源。

* 300-399：很好，精确的时钟源。

* 400-499：理想的时钟源，如有可能，必须选择它作为时钟源。*/

2.1.1.2 read回调函数

时钟源本身不会产生中断。要获取时钟源的计数值，只能通过主动调用其read回调函数，来获取当前时钟源的计数值。read回调函数的实现，就是读取时钟源的相关寄存器，获取其中的计数值。

2.1.1.3 mult和shift

由于使用read函数只能从时钟源硬件中获取一个cycle计数值，如果我们需要将其转换为时间，则需要知道当前时钟源的时钟频率F。这样t=cycle/F就可以将计数值转换为时间。在时钟源初始化时，可以从时钟源的寄存器中得到频率F的值，但是由于内核中不支持浮点运算，所以内核使用乘法和移位操作来取代除法操作。使用公式t=(cycle * mult) >> shift来替代t=cycle/F，把计数值转化为时间。

从转换精度考虑，mult的值越大越好。但是为了计算过程中cycle*mult不发生溢出，mult的值不能取值过大。为此内核假设cycle计数值被转换后的最大时间值为10分钟，主要的考虑是CPU进入IDLE状态后，时间信息不会被更新。只要在10分钟内退出IDLE状态，时钟源的cycle计数值就可以被正确的转换为相应的时间，然后系统时间可以被正确的更新。这个值不一定是10分钟，它由函数clocksource_max_deferment进行计算，并保存到max_idle_ns字段中。这里，我们使用10分钟这个假设值，推算出合适的mult和shift的值。

2.1.1.4 max_idle_ns

使用函数clocksource_max_deferment计算出合理的max_idle_ns的值。在内核中，tickless的代码要考虑这个值，以防止在NO_HZ配置环境下，系统保持IDLE状态时间过长，导致溢出。

2.1.1.5 cycle_last

字段cycle_last记录了上一次调用read回调函数获取的cycle的大小。这样，我们就可以知道当前时刻和最近一次调用read回调函数的时间差。

（未完待续。。。）