background
Read the fucking source code!
--By Lu XunA picture is worth a thousand words.
--By Gorky
Description:
- Kernel version: 4.14
- ARM64 processor, Contex-A53, dual-core
- Use tools: Source Insight 3.5, Visio
1 Overview
RCU, Read-Copy-Update
, Is a synchronization mechanism in the Linux kernel.
RCU
It is often described as a replacement for read-write locks. Its characteristic is that the reader does not need to synchronize directly with the writer, and the reader and the writer can also execute concurrently. RCU
The goal is to minimize the overhead on the reader side, so it is also commonly used in scenarios that require high reader performance.
-
advantage:
- The reader side has very little overhead, no need to acquire any locks, no need to execute atomic instructions or memory barriers;
- No deadlock problem;
- No problem of priority inversion;
- No danger of memory leaks;
- Very good real-time delay;
-
Disadvantages:
- The synchronization overhead of the writer is relatively large, and the writers need to be mutually exclusive.
- It is more complicated to use than other synchronization mechanisms;
Let's take a picture to describe the general operation:
- Multiple readers can access critical resources concurrently and use
rcu_read_lock/rcu_read_unlock
them simultaneously to calibrate critical sections; - The writer (
updater
) copies a copy as a basis for modification when updating critical resources. When all readers leave the critical section, they point the pointer to the old critical resource to the updated copy and recycle the old resource; - Only one writer is shown in the figure. When there are multiple writers, mutual exclusion is required between writers;
The above description is relatively simple, and the implementation of RCU is very complicated. This article first gives a first impression of RCU, and analyzes the example with the interface. The subsequent articles will go deeper into the underlying implementation principle. let's start!
2. RCU basics
2.1 Basic elements of RCU
RCU
The basic idea is to divide the update Update
operation into two parts: 1) Removal
remove; 2) Reclamation
recycle.
To put it bluntly, the critical resource is read by multiple readers. When the writer updates the copy after modification, the first step needs to remove the old critical resource data (the modification pointer points), and the second step requires Recycle old data (for example kfree
).
Therefore, it is functionally divided into the following three basic elements: Reader/Updater/Reclaimer
The interaction between the three is as follows:
-
Reader
- Use
rcu_read_lock
andrcu_read_unlock
to define the critical area of the reader. When accessing theRCU
protected data, you must always access it in the critical area; - Before accessing the protected data, you need to use
rcu_dereference
to get theRCU-protected
pointer; - When using non-preemptible,
RCU
yourcu_read_lock/rcu_read_unlock
can not use the code that can sleep;
- Use
-
Updater
- When multiple Updaters update data, they need to use a mutual exclusion mechanism for protection;
- Updater is used
rcu_assign_pointer
to remove the old pointer to point to the updated critical resources; - Updater uses
synchronize_rcu
orcall_rcu
to startReclaimer
, to recycle the old critical resources, whichsynchronize_rcu
means synchronously waiting for recycling, whichcall_rcu
means asynchronous recycling;
-
Reclaimer
- Reclaimer recycles old critical resources;
- In order to ensure that no readers are accessing the critical resources to be recovered, Reclaimer needs to wait for all readers to exit the critical section. This waiting time is called the grace period (
Grace Period
);
2.2 Three basic mechanisms of RCU
Used to provide the functions described above, RCU
based on three mechanisms.
2.2.1 Publish-Subscribe Mechanism
What is the concept of the subscription mechanism, come to the picture:
Updater
And theReader
likePublisher
andSubsriber
relations;Updater
After updating the content, call the interface to publish, andReader
call the interface to read the published content;
So what needs to be done to ensure this subscription mechanism? Let's look at a pseudo code:
/* Definiton of global structure */
1 struct foo {
2 int a;
3 int b;
4 int c;
5 };
6 struct foo *gp = NULL;
7
8 /* . . . */
9 /* =========Updater======== */
10 p = kmalloc(sizeof(*p), GFP_KERNEL);
11 p->a = 1;
12 p->b = 2;
13 p->c = 3;
14 gp = p;
15
16 /* =========Reader======== */
17 p = gp;
18 if (p != NULL) {
19 do_something_with(p->a, p->b, p->c);
20 }
At first glance, it seems that the problem is not too big. The Updater performs assignment and update, and the Reader performs reading and other processing. However, due to the problems of out-of-order compilation and execution, the order of execution of the above code may not necessarily be the order of code. For example, in some architectures ( DEC Alpha
), the reader's operation part may be operated before p is assigned do_something_with()
.
To solve this problem, Linux offers rcu_assign_pointer/rcu_dereference
macro to ensure the order of execution, Linux kernel is also based on rcu_assign_pointer/rcu_dereference
macros a higher level packages, such as list
, hlist
therefore, there are three protected RCU scene kernel: 1) pointer; 2) list the list ; 3) hlist hash linked list.
For these three scenarios, the Publish-Subscribe
interface is as follows:
2.2.2 Wait For Pre-Existing RCU Readers to Complete
Reclaimer needs to recycle the old critical resources, so the question comes, when will it happen? Therefore, it is RCU
necessary to provide a mechanism to ensure that all previous RCU readers have been completed, that is rcu_read_lock/rcu_read_unlock
, they can only be recycled after exiting the calibrated critical section.
- The Readers and Updater in the figure are executed concurrently;
- When the Updater performs the
Removal
operation, it is calledsynchronize_rcu
, marking the end of the update and starting to enter the recovery phase; - After the
synchronize_rcu
call, there may be new readers to read critical resources (updated content) at this time, but the readers who areGrace Period
only waitingPre-Existing
are in the figureReader-4, Reader-5
. As long as these RCU readers who existed before exited the critical section, it means the end of the grace period, so the recycling process is carried out; synchronize_rcu
It is not that the lastPre-Existing
RCU reader leaves immediately after leaving the critical section, it may have a scheduling delay;
2.2.3 Maintain Multiple Versions of Recently Updated Objects
It 2.2.2节
can be seen that after the Updater updates, before the Reclaimer recycles, there will be two new and old versions of the critical resources. Only after synchronize_rcu
returning, the Reclaimer recycles the old critical resources, and the last version remains. Obviously, when there are multiple Updaters, there will be more critical resource versions.
Let's take a picture, taking pointers and linked lists as examples:
- The
synchronize_rcu
start of the call is a critical point, maintaining different versions of critical resources; - After Reclaimer reclaims the old version of resources, it is finally unified;
3. RCU example analysis
It's time for a wave fucking sample code
.
- The overall code logic:
- Construct four kernel threads, two kernel threads to test the RCU protection operation of the pointer, and two kernel threads to test the RCU protection operation of the linked list;
- At the time of recycling, two mechanisms of
synchronize_rcu
synchronous recycling andcall_rcu
asynchronous recycling were used ; - In order to simplify the code, the basic fault tolerance judgment has been omitted;
- The mechanism of multiple Updaters is not considered, therefore, the mutually exclusive operation between Updaters is also omitted;
#include <linux/module.h>
#include <linux/init.h>
#include <linux/slab.h>
#include <linux/kthread.h>
#include <linux/rcupdate.h>
#include <linux/delay.h>
struct foo {
int a;
int b;
int c;
struct rcu_head rcu;
struct list_head list;
};
static struct foo *g_pfoo = NULL;
LIST_HEAD(g_rcu_list);
struct task_struct *rcu_reader_t;
struct task_struct *rcu_updater_t;
struct task_struct *rcu_reader_list_t;
struct task_struct *rcu_updater_list_t;
/* 指针的Reader操作 */
static int rcu_reader(void *data)
{
struct foo *p = NULL;
int cnt = 100;
while (cnt--) {
msleep(100);
rcu_read_lock();
p = rcu_dereference(g_pfoo);
pr_info("%s: a = %d, b = %d, c = %d\n",
__func__, p->a, p->b, p->c);
rcu_read_unlock();
}
return 0;
}
/* 回收处理操作 */
static void rcu_reclaimer(struct rcu_head *rh)
{
struct foo *p = container_of(rh, struct foo, rcu);
pr_info("%s: a = %d, b = %d, c = %d\n",
__func__, p->a, p->b, p->c);
kfree(p);
}
/* 指针的Updater操作 */
static int rcu_updater(void *data)
{
int value = 1;
int cnt = 100;
while (cnt--) {
struct foo *old;
struct foo *new = (struct foo *)kzalloc(sizeof(struct foo), GFP_KERNEL);
msleep(200);
old = g_pfoo;
*new = *g_pfoo;
new->a = value;
new->b = value + 1;
new->c = value + 2;
rcu_assign_pointer(g_pfoo, new);
pr_info("%s: a = %d, b = %d, c = %d\n",
__func__, new->a, new->b, new->c);
call_rcu(&old->rcu, rcu_reclaimer);
value++;
}
return 0;
}
/* 链表的Reader操作 */
static int rcu_reader_list(void *data)
{
struct foo *p = NULL;
int cnt = 100;
while (cnt--) {
msleep(100);
rcu_read_lock();
list_for_each_entry_rcu(p, &g_rcu_list, list) {
pr_info("%s: a = %d, b = %d, c = %d\n",
__func__, p->a, p->b, p->c);
}
rcu_read_unlock();
}
return 0;
}
/* 链表的Updater操作 */
static int rcu_updater_list(void *data)
{
int cnt = 100;
int value = 1000;
while (cnt--) {
msleep(100);
struct foo *p = list_first_or_null_rcu(&g_rcu_list, struct foo, list);
struct foo *q = (struct foo *)kzalloc(sizeof(struct foo), GFP_KERNEL);
*q = *p;
q->a = value;
q->b = value + 1;
q->c = value + 2;
list_replace_rcu(&p->list, &q->list);
pr_info("%s: a = %d, b = %d, c = %d\n",
__func__, q->a, q->b, q->c);
synchronize_rcu();
kfree(p);
value++;
}
return 0;
}
/* module初始化 */
static int rcu_test_init(void)
{
struct foo *p;
rcu_reader_t = kthread_run(rcu_reader, NULL, "rcu_reader");
rcu_updater_t = kthread_run(rcu_updater, NULL, "rcu_updater");
rcu_reader_list_t = kthread_run(rcu_reader_list, NULL, "rcu_reader_list");
rcu_updater_list_t = kthread_run(rcu_updater_list, NULL, "rcu_updater_list");
g_pfoo = (struct foo *)kzalloc(sizeof(struct foo), GFP_KERNEL);
p = (struct foo *)kzalloc(sizeof(struct foo), GFP_KERNEL);
list_add_rcu(&p->list, &g_rcu_list);
return 0;
}
/* module清理工作 */
static void rcu_test_exit(void)
{
kfree(g_pfoo);
kfree(list_first_or_null_rcu(&g_rcu_list, struct foo, list));
kthread_stop(rcu_reader_t);
kthread_stop(rcu_updater_t);
kthread_stop(rcu_reader_list_t);
kthread_stop(rcu_updater_list_t);
}
module_init(rcu_test_init);
module_exit(rcu_test_exit);
MODULE_AUTHOR("Loyen");
MODULE_LICENSE("GPL");
In order to prove that there is no deception, the output log running on the development board is posted, as shown below:
4. API introduction
4.1 Core API
The following interfaces cannot be more core.
a. rcu_read_lock() //标记读者临界区的开始
b. rcu_read_unlock() //标记读者临界区的结束
c. synchronize_rcu() / call_rcu() //等待Grace period结束后进行资源回收
d. rcu_assign_pointer() //Updater使用这个宏对受RCU保护的指针进行赋值
e. rcu_dereference() //Reader使用这个宏来获取受RCU保护的指针
4.2 Other related APIs
Based on the core API, other related APIs have been extended, as follows, no more details:
RCU list traversal::
list_entry_rcu
list_entry_lockless
list_first_entry_rcu
list_next_rcu
list_for_each_entry_rcu
list_for_each_entry_continue_rcu
list_for_each_entry_from_rcu
list_first_or_null_rcu
list_next_or_null_rcu
hlist_first_rcu
hlist_next_rcu
hlist_pprev_rcu
hlist_for_each_entry_rcu
hlist_for_each_entry_rcu_bh
hlist_for_each_entry_from_rcu
hlist_for_each_entry_continue_rcu
hlist_for_each_entry_continue_rcu_bh
hlist_nulls_first_rcu
hlist_nulls_for_each_entry_rcu
hlist_bl_first_rcu
hlist_bl_for_each_entry_rcu
RCU pointer/list update::
rcu_assign_pointer
list_add_rcu
list_add_tail_rcu
list_del_rcu
list_replace_rcu
hlist_add_behind_rcu
hlist_add_before_rcu
hlist_add_head_rcu
hlist_add_tail_rcu
hlist_del_rcu
hlist_del_init_rcu
hlist_replace_rcu
list_splice_init_rcu
list_splice_tail_init_rcu
hlist_nulls_del_init_rcu
hlist_nulls_del_rcu
hlist_nulls_add_head_rcu
hlist_bl_add_head_rcu
hlist_bl_del_init_rcu
hlist_bl_del_rcu
hlist_bl_set_first_rcu
RCU::
Critical sections Grace period Barrier
rcu_read_lock synchronize_net rcu_barrier
rcu_read_unlock synchronize_rcu
rcu_dereference synchronize_rcu_expedited
rcu_read_lock_held call_rcu
rcu_dereference_check kfree_rcu
rcu_dereference_protected
bh::
Critical sections Grace period Barrier
rcu_read_lock_bh call_rcu rcu_barrier
rcu_read_unlock_bh synchronize_rcu
[local_bh_disable] synchronize_rcu_expedited
[and friends]
rcu_dereference_bh
rcu_dereference_bh_check
rcu_dereference_bh_protected
rcu_read_lock_bh_held
sched::
Critical sections Grace period Barrier
rcu_read_lock_sched call_rcu rcu_barrier
rcu_read_unlock_sched synchronize_rcu
[preempt_disable] synchronize_rcu_expedited
[and friends]
rcu_read_lock_sched_notrace
rcu_read_unlock_sched_notrace
rcu_dereference_sched
rcu_dereference_sched_check
rcu_dereference_sched_protected
rcu_read_lock_sched_held
SRCU::
Critical sections Grace period Barrier
srcu_read_lock call_srcu srcu_barrier
srcu_read_unlock synchronize_srcu
srcu_dereference synchronize_srcu_expedited
srcu_dereference_check
srcu_read_lock_held
SRCU: Initialization/cleanup::
DEFINE_SRCU
DEFINE_STATIC_SRCU
init_srcu_struct
cleanup_srcu_struct
All: lockdep-checked RCU-protected pointer access::
rcu_access_pointer
rcu_dereference_raw
RCU_LOCKDEP_WARN
rcu_sleep_check
RCU_NONIDLE
Okay, listing these APIs is a bit confusing.
The mysterious veil of RCU is initially unveiled, and then it will be a bit difficult to pick up clothes inside. After all, the implementation mechanism behind RCU is really difficult. So, the question is coming, do you want to be a man who sees the king? Please pay attention.
reference
Documentation/RCU
What is RCU, Fundamentally?
What is RCU? Part 2: Usage
RCU part 3: the RCU API
Introduction to RCU
Welcome to pay attention to the public number, and continue to share the core mechanism articles in graphic form