In-depth understanding of the RCU mechanism in the Linux kernel

RCU (Read-Copy Update) is an important synchronization mechanism in Linux. As the name implies, it is "read, copy and update", and the blunt point is "read at will, but when updating data, you need to make a copy first, complete the modification on the copy, and then replace the old data all at once." This is a synchronization mechanism implemented by the Linux kernel for "read more and write less" shared data.

Different from other synchronization mechanisms, it allows multiple readers to access shared data at the same time, and the reader’s performance will not be affected ("read at will"), and there is no need for a synchronization mechanism between readers and writers (but you need to "copy and then Write"), but if there are multiple writers, when the writer overwrites the updated "copy" to the original data, the writer and the writer need to use other synchronization mechanisms to ensure synchronization.

A typical application scenario of RCU is a linked list. A header file (include/linux/rculist.h) is provided in the Linux kernel, which provides an interface for adding, deleting, checking and modifying the linked list using the RCU mechanism. This article will use an example to use the interface provided by rculist.h to add, delete, check and modify the linked list to describe the principle of RCU and introduce the relevant APIs in the Linux kernel (based on the source code of Linux v3.4.0).

Add list item

The source code for adding items to the linked list using RCU in the Linux kernel is as follows:

#define list_next_rcu(list)     (*((struct list_head __rcu **)(&(list)->next)))

static inline void __list_add_rcu(struct list_head *new,
                struct list_head *prev, struct list_head *next)
{
        new->next = next;
        new->prev = prev;
        rcu_assign_pointer(list_next_rcu(prev), new);
        next->prev = new;
}

The rcu in the list_next_rcu() function is a compilation option used by the code analysis tool Sparse. It is stipulated that the pointer with the rcu tag cannot be used directly, and rcu_dereference() must be used to return a pointer protected by RCU before it can be used. The relevant knowledge of the rcu_dereference() interface will be introduced later, this section focuses on the rcu_assign_pointer() interface. First look at the source code of rcu_assign_pointer():

#define __rcu_assign_pointer(p, v, space) \
    ({ \
        smp_wmb(); \
        (p) = (typeof(*v) __force space *)(v); \
    })

The final effect of the above code is to assign the value of v to p. The key point is the memory barrier in line 3. What is a memory barrier (Memory Barrier)? When the CPU uses pipeline technology to execute instructions, it only guarantees the execution order of instructions with memory dependencies, such as p = v; a = *p;, because the memory pointed to by the pointer p accessed by the second instruction depends on the first instruction , So the CPU will ensure that the first instruction is executed before the second instruction is executed. But for instructions without memory dependency, such as the above __list_add_rcu() interface, if line 8 is written as prev->next = new;, because this assignment operation does not involve access to the memory pointed to by the new pointer, it is considered Do not rely on the assignment of new->next and new->prev in lines 6,7, the CPU may execute prev->next = new; and then execute new->prev = prev;, which will cause The new pointer (that is, the newly added linked list item) is added to the linked list before the initialization is completed. If at this time a reader happens to traverse and access the new linked list item (because an important feature of RCU is that it can be read at will Operation), a linked list item that has not been initialized will be accessed! This problem can be solved by setting a memory barrier, which ensures that the instructions before the memory barrier will be executed before the instructions after the memory barrier. This ensures that the items added to the linked list must have been initialized.

As a final reminder, it should be noted here that if there may be multiple threads performing the operation of adding a linked list item at the same time, the operation of adding a linked list item needs to be protected by other synchronization mechanisms (such as spin_lock, etc.).

Linux kernel development related learning video materials, click: learning materials to get

 

Access list item

The common code patterns for accessing RCU linked list items in Linux kernel are:

rcu_read_lock();
list_for_each_entry_rcu(pos, head, member) {
    // do something with `pos`
}
rcu_read_unlock();

The rcu_read_lock() and rcu_read_unlock() mentioned here are the key to RCU “read at will”. Their effect is to declare a read-side critical section. Before talking about the critical section of the reading end, let's take a look at the macro function list_for_each_entry_rcu that reads the linked list items. Tracing back to the source code, obtaining a pointer to a linked list item mainly calls a macro function named rcu_dereference(), and the main implementation of this macro function is as follows:

#define __rcu_dereference_check(p, c, space) \
    ({ \
        typeof(*p) *_________p1 = (typeof(*p)*__force )ACCESS_ONCE(p); \
        rcu_lockdep_assert(c, "suspicious rcu_dereference_check()" \
                      " usage"); \
        rcu_dereference_sparse(p, space); \
        smp_read_barrier_depends(); \
        ((typeof(*p) __force __kernel *)(_________p1)); \
    })
第 3 行:声明指针 _p1 = p;
第 7 行:smp_read_barrier_depends();
第 8 行:返回 _p1;

The above two pieces of code can actually be regarded as such a pattern:

rcu_read_lock();
p1 = rcu_dereference(p);
if (p1 != NULL) {
    // do something with p1, such as:
    printk("%d\n", p1->field);
}
rcu_read_unlock();

According to the implementation of rcu_dereference(), the final effect is to assign one pointer to another. What if the rcu_dereference() in line 2 above is directly written as p1 = p? In the general processor architecture, there is no problem at all. But on the alpha, the compiler's value-speculation optimization option is said to "guess" the value of p1, and then the rearrangement instruction first takes the value p1->field~ Therefore, in the Linux kernel, the implementation of smp_read_barrier_depends() is architecture-related. Arm, x86 and other architectures are empty implementations, and a memory barrier is added to alpha to ensure that the real address of p is obtained first and then dereferenced. Therefore, the "__rcu" compilation option mentioned in the previous section "Adding Linked List Items" forces to check whether rcu_dereference() is used to access RCU-protected data, which is actually to make the code more portable.

Now come back to the issue of the critical section at the read end. Multiple read-end critical sections are not mutually exclusive, that is, multiple readers can be in the read-end critical section at the same time, but once a piece of memory data can be referenced by pointers in the read-end critical section, the release of this memory block data must wait until the read When the end critical section ends, the Linux kernel API waiting for the end of the read end critical section is synchronize_rcu(). The check of the critical section of the read end is global. If any code in the system is in the critical section of the read end, synchronize_rcu() will block, and it will return only when all the critical sections of the read end are finished. In order to understand this problem intuitively, give the following code example:

/* `p` 指向一块受 RCU 保护的共享数据 */

/* reader */
rcu_read_lock();
p1 = rcu_dereference(p);
if (p1 != NULL) {
    printk("%d\n", p1->field);
}
rcu_read_unlock();

/* free the memory */
p2 = p;
if (p2 != NULL) {
    p = NULL;
    synchronize_rcu();
    kfree(p2);
}

Use the following diagram to show the timing relationship between multiple readers and memory release threads:

 

In the above figure, the square of each reader represents the time period from obtaining the reference of p (line 5 code) to the end of the critical section of the reading end; t1 represents the time when p = NULL; t2 represents the time when the synchronize_rcu() call starts; t3 represents The time returned by synchronize_rcu(). Let's look at Reader 1, 2, and 3 first. Although the end times of these 3 readers are different, they all get a reference to the p address before t1. Synchronize_rcu() is called at t2. At this time, the critical section of Reader1 has ended, but Reader2 and Reader 3 are still in the critical section of the read end, so you must wait until the critical section of Reader2 and 3 ends, that is, after t3, t3, You can execute kfree(p2) to release the memory. This period of time when synchronize_rcu() is blocked has a name called Grace period. And Reader 4, 5, and 6, regardless of the time relationship with the Grace period, since the time to obtain the reference is after t1, the reference of the p pointer cannot be obtained, so it will not enter the branch of p1 != NULL.

Delete list item

Knowing the Grace period mentioned earlier, it is easy to understand the deletion of linked list items. Common code patterns are:

p = seach_the_entry_to_delete();
list_del_rcu(p->list);
synchronize_rcu();
kfree(p);
其中 list_del_rcu() 的源码如下,把某一项移出链表:

/* list.h */
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
    next->prev = prev;
    prev->next = next;
}

/* rculist.h */
static inline void list_del_rcu(struct list_head *entry)
{
    __list_del(entry->prev, entry->next);
    entry->prev = LIST_POISON2;
}

According to the example of "Accessing Linked List Items" in the previous section, if a reader can obtain the linked list item we are going to delete from the linked list, he must enter the critical area of ​​the read end before synchronize_rcu(), and synchronize_rcu() will guarantee the read end The memory of the linked list item will be released when the critical section ends, and the linked list item that the reader is accessing will not be released.

Update list item

As mentioned earlier, the update mechanism of RCU is "Copy Update", and the update of RCU linked list items is also this mechanism. The typical code pattern is:

p = search_the_entry_to_update();
q = kmalloc(sizeof(*p), GFP_KERNEL);
*q = *p;
q->field = new_value;
list_replace_rcu(&p->list, &q->list);
synchronize_rcu();
kfree(p);

The third and fourth lines are to make a copy, complete the update on the copy, and then call list_replace_rcu() to replace the old node with the new node. The source code is as follows:
the 3rd and 4th lines are to make a copy and complete the update on the copy, then call list_replace_rcu() to replace the old node with the new node, and finally release the memory of the old node. The source code of list_replace_rcu() is as follows:

static inline void list_replace_rcu(struct list_head *old,
                struct list_head *new)
{
    new->next = old->next;
    new->prev = old->prev;
    rcu_assign_pointer(list_next_rcu(new->prev), new);
    new->next->prev = new;
    old->prev = LIST_POISON2;
}

Linux kernel development related learning video materials, click: learning materials to get

Linux kernel systematic study outline, click Mind Mapping acquisition

 

Guess you like

Origin blog.csdn.net/Linuxhus/article/details/114549668