Let’s talk about threads in the kernel: operating functions, process status, task_struct, for example,

Perface

Kernel threads are processes launched directly by the kernel itself .

Kernel threads actually delegate kernel functions to independent processes , which execute "in parallel" with other processes in the kernel.

Kernel threads are often called kernel daemons . The kernel thread is a scheduled entity. It is added to a certain data structure , and the scheduler schedules the thread according to the actual situation. Kernel threads have a similar role to user-mode threads and are usually used to perform certain periodic computing tasks or perform tasks in the background that require a large amount of calculations .

This article mainly introduces the use of APIs related to kernel thread operations and the basic principles of kernel thread implementation. More in-depth content will be introduced in subsequent articles.

Kernel thread operation function

The functions (API) involved in kernel thread operations are mainly functions such as creation, scheduling, and stopping . It is also relatively simple to operate.

The definitions of these interfaces are introduced below.

create thread

The function to create a thread is kthread_create . The following is the prototype of the function. This function is actually a macro definition of the function kthread_create_on_node . The latter creates a thread on a certain CPU.

The first two parameters of this function are the thread's main function pointer and the parameters of the function , while the following parameters name the thread by changing parameters .

#define kthread_create(threadfn, data, namefmt, arg...) \
       kthread_create_on_node(threadfn, data, NUMA_NO_NODE, namefmt, ##arg)

wake up thread

The thread created by this function is in a non-running state and needs to be awakened by calling the wake_up_process function before it can run on the CPU.

int wake_up_process(struct task_struct *p)

Create and run threads

There is another interface in the kernel API that can directly create a running thread , which is defined as follows. This is actually calling the two functions described above .

#define kthread_run(threadfn, data, namefmt, ...)                          \
({                                                                         \
    struct task_struct *__k                                            \
            = kthread_create(threadfn, data, namefmt, ## __VA_ARGS__); \
    if (!IS_ERR(__k))                                                  \
            wake_up_process(__k);                                      \
    __k;                                                               \
})

Stop thread

The thread can also be stopped, at which time the main function will exit . Of course, the implementation of the main function needs to consider this issue. The following is the function interface to stop the thread.

int kthread_stop(struct task_struct *k)

Thread Scheduling

After the kernel thread is created, it will continue to run unless it encounters a blocking event or schedules itself out. Through the following functions, threads can schedule themselves out. The meaning of scheduling is to give up the CPU to other threads .

asmlinkage __visible void __sched schedule(void)

The whole simple example

The basic principles and related APIs of kernel threads have been introduced before. Next we will develop a basic example of kernel threads.

This example starts a kernel thread in a kernel module . The function of the kernel thread is very simple, which is to regularly output a string to the system log . The purpose of this example is mainly to introduce how to create, use and destroy a kernel thread.

#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/mm.h>

#include <linux/in.h>
#include <linux/inet.h>
#include <linux/socket.h>
#include <net/sock.h>
#include <linux/kthread.h>
#include <linux/sched.h>

#define BUF_SIZE 1024
struct task_struct *main_task;

/* 这个函数用于将内核线程置于休眠状态,也就是将其调度出
 * 队列。*/
static inline void sleep(unsigned sec)
{
        __set_current_state(TASK_INTERRUPTIBLE);
        schedule_timeout(sec * HZ);
}

/* 线程函数, 这个是线程执行的主体 */
static int multhread_server(void *data)
{
        int index = 0;

        /* 在线程没有被停止的情况下,循环向系统日志输出
         * 内容, 完成后休眠1秒。*/
        while (!kthread_should_stop()) {
                printk(KERN_NOTICE "thread run %d\n", index);
                index ++; 
                sleep(1);
        }

        return 0;
}


static int multhread_init(void)
{
        ssize_t ret = 0;

        printk("Hello, thread! \n");
        /* 创建并启动一个内核线程, 这里参数为线程函数,
         * 函数的参数(NULL),和线程名称。 */
        main_task = kthread_run(multhread_server,
                                  NULL,
                                  "multhread_server");
        if (IS_ERR(main_task)) {
                ret = PTR_ERR(main_task);
                goto failed;
        }

failed:
        return ret;
}

static void multhread_exit(void)
{
        printk("Bye thread!\n");
        /* 停止线程 */
        kthread_stop(main_task);

}

module_init(multhread_init);
module_exit(multhread_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("SunnyZhang<[email protected]>");

Basic implementation principles

create thread

Whether it is a user-mode process or a kernel thread, they are all threads in the kernel mode .

In the Linux operating system, creating a thread is essentially a process of cloning the parent process (thread) . Currently, in versions 3.x and later, the creation of kernel threads is completed by a background thread operation named kthreadd.

The interface for creating threads is only used to create tasks, add them to the task list, and wait for specific processing by the background thread .

The function called by kthread_create or kthread_run to create a thread in the previous article is __kthread_create_on_node , which means creating a thread on a certain CPU.

This function actually just creates a request to create a thread. The following is the tailored code. The core content is as follows:

struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
                                                    void *data, int node,
                                                    const char namefmt[],
                                                    va_list args)
{
        DECLARE_COMPLETION_ONSTACK(done);
        struct task_struct *task;
        struct kthread_create_info *create = kmalloc(sizeof(*create),
                                                     GFP_KERNEL);

        if (!create)
                return ERR_PTR(-ENOMEM);
        create->threadfn = threadfn;
        create->data = data;
        create->node = node;
        create->done = &done;

        spin_lock(&kthread_create_lock);
        /* 将创建任务添加到链表中 */
        list_add_tail(&create->list, &kthread_create_list);
        spin_unlock(&kthread_create_lock);

        wake_up_process(kthreadd_task);
        ... ...
}

The specific creation work is performed in a background thread named kthreadd . This thread will obtain the creation request from the queue and create threads one by one.

The interface called when creating a thread is kernel_thread. This function implements the operation of cloning a child thread from a parent thread and establishes the association between the parent and child threads.

thread scheduling

Thread management and scheduling in Linux is a very complex topic that is difficult to explain clearly in one article. Here we just introduce the basic principles.

Currently, the Linux operating system uses the CFS scheduling algorithm by default . This algorithm is based on priority and time slices . This algorithm contains 4 parts:

    • time accounting
    • process selection
    • Scheduler entry
    • sleep and wake

Time accounting is used to record the virtual time of process running, while process selection is to select which process should be scheduled to run on the CPU based on the policy. The data structure chosen by the process is a red-black tree. The red-black tree is a self-balancing binary tree, that is, the data in it is ordered, so that the target data can be easily found.

The Linux kernel uses another trick during specific implementation, which is to put the next process to be scheduled into the cache, so that the process can be directly found for scheduling, reducing the retrieval time.

The scheduling entry point of the Linux kernel is the schedule function. When a thread calls this function, thread scheduling will be triggered . The implementation of this function itself is very simple, but it internally calls the context_switch function to implement real scheduling . Before the destination process is obtained through the scheduling class.

static __always_inline struct rq * 
context_switch(struct rq *rq, struct task_struct *prev,
               struct task_struct *next, struct rq_flags *rf)

In this way, the current process can be scheduled out and the new process can be scheduled in through the context_switch function .

context_switch will eventually be dispatched to a platform-related function, and this function is implemented in assembly language . It mainly implements register and stack processing , and finally completes process switching.

process status

1、R

In a running or runnable state, that is, the process is running or waiting in the run queue (executable queue). Only processes in this state may be running on the CPU, and there may be multiple processes in this state at the same time.

(Note: Many textbooks define the status of a process being executed on the CPU as Running, and the status of a process that is executable but has not yet been scheduled for execution as Ready. These two states are unified into the R state under Linux)

2、S

In an interruptible sleep state, that is, the process is sleeping because it is waiting for the completion of an event (or waiting for the formation of a certain condition or waiting for a signal, etc.)

(Note: waiting for socket connection, waiting for semaphore, etc.) and be suspended; when these events occur, one or more processes in the corresponding waiting queue will be awakened. Under normal circumstances, the vast majority of processes in the process list are in this state.

3、D

Being in an uninterruptible sleep state. Uninterruptible does not mean that the CPU does not respond to external hardware interrupts, but that the process does not respond to asynchronous signals and cannot be killed with the kill command. The process must wait until an interrupt occurs.

4、T

In pause or tracking state. The process enters the suspended state after receiving SIGSTOP, SIGSTP, SIGTIN, SIGTOU and other signals (unless the process is in an uninterruptible sleep state); when a SIGCONT signal is then sent to the process, the process can resume from the suspended state to the running or capable state.

When a process is traced, it is in the traced state. "Tracked" means that the process is paused and waiting for the process that is tracking it to operate on it. For example, in GDB debugging, set a breakpoint on the tracked process, and the process will be in the tracked state when it stops at the breakpoint.

There is a difference between the pause and the tracking state. The tracked state is equivalent to an additional layer of protection on top of the paused state. The process in the tracked state cannot be awakened in response to the SIGCONT signal and can only wait until the debugging process executes ptrace_cont through the ptrace system call. , ptrace_detach and other operations (operations specified through the parameters of the ptrace system call), or the debugging process exits, the debugged process can be restored to the R state.

5、Z

In a zombie state, also called exit state. It means that the process has ended, giving up almost all memory space, without any executable code, and cannot be scheduled. It only reserves a position in the process list to record the exit status and other information of the process (task_struct structure [saves the The exit code of the process]) is collected by other processes.

6、X

A process may not retain its task_struct during exit. For example, a process is a detachable process in a multi-threaded program; or the parent process explicitly ignores the SIGCHLD signal by setting the Handler of the SIGCHLD signal to SIG_IGN.

At this time, the process is placed in the exit_dead exit status, which means that the following code will immediately release the process completely. Therefore, the exit_dead state is very short-lived and almost impossible to capture through the ps command.

task_struct

We saw the frequently appearing task_struct structure earlier. Each process has a process control block (PCB) in the kernel to maintain process-related information. The process control block of the Linus kernel is the structure of task_struct.

struct task_struct{
volatile long state;         //说明了该进程是否可以执行,还是可中断等信息
unsigned long flags;         //Flage 是进程号,在调用fork()时给出
int sigpending;              //进程上是否有待处理的信号
mm_segment_t addr_limit;     //进程地址空间,区分内核进程与普通进程在内存存放的位置不同
                           //0-0xBFFFFFFF for user-thead
                           //0-0xFFFFFFFF for kernel-thread
//调度标志,表示该进程是否需要重新调度,若非0,则当从内核态返回到用户态,会发生调度
volatile long need_resched;
int lock_depth;  //锁深度
long nice;       //进程的基本时间片
//进程的调度策略,有三种,实时进程:SCHED_FIFO,SCHED_RR, 分时进程:SCHED_OTHER
unsigned long policy;
struct mm_struct *mm; //进程内存管理信息
int processor;
//若进程不在任何CPU上运行, cpus_runnable 的值是0,否则是1 这个值在运行队列被锁时更新
unsigned long cpus_runnable, cpus_allowed;
struct list_head run_list; //指向运行队列的指针
unsigned long sleep_time;  //进程的睡眠时间
//用于将系统中所有的进程连成一个双向循环链表, 其根是init_task
struct task_struct *next_task, *prev_task;
struct mm_struct *active_mm;
struct list_head local_pages;       //指向本地页面     
unsigned int allocation_order, nr_local_pages;
struct linux_binfmt *binfmt;  //进程所运行的可执行文件的格式
int exit_code, exit_signal;
int pdeath_signal;     //父进程终止时向子进程发送的信号
unsigned long personality;
//Linux可以运行由其他UNIX操作系统生成的符合iBCS2标准的程序
int did_exec:1;
pid_t pid;    //进程标识符,用来代表一个进程
pid_t pgrp;   //进程组标识,表示进程所属的进程组
pid_t tty_old_pgrp;  //进程控制终端所在的组标识
pid_t session;  //进程的会话标识
pid_t tgid;
int leader;     //表示进程是否为会话主管
struct task_struct *p_opptr,*p_pptr,*p_cptr,*p_ysptr,*p_osptr;
struct list_head thread_group;   //线程链表
struct task_struct *pidhash_next; //用于将进程链入HASH表
struct task_struct **pidhash_pprev;
wait_queue_head_t wait_chldexit;  //供wait4()使用
struct completion *vfork_done;  //供vfork() 使用
unsigned long rt_priority; //实时优先级,用它计算实时进程调度时的weight值

long per_cpu_utime[NR_CPUS], per_cpu_stime[NR_CPUS];
//内存缺页和交换信息:
//min_flt, maj_flt累计进程的次缺页数(Copy on Write页和匿名页)和主缺页数(从映射文件或交换
//设备读入的页面数); nswap记录进程累计换出的页面数,即写到交换设备上的页面数。
//cmin_flt, cmaj_flt, cnswap记录本进程为祖先的所有子孙进程的累计次缺页数,主缺页数和换出页面数。
//在父进程回收终止的子进程时,父进程会将子进程的这些信息累计到自己结构的这些域中
unsigned long min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap;
int swappable:1; //表示进程的虚拟地址空间是否允许换出
//进程认证信息
//uid,gid为运行该进程的用户的用户标识符和组标识符,通常是进程创建者的uid,gid
//euid,egid为有效uid,gid
//fsuid,fsgid为文件系统uid,gid,这两个ID号通常与有效uid,gid相等,在检查对于文件
//系统的访问权限时使用他们。
//suid,sgid为备份uid,gid
uid_t uid,euid,suid,fsuid;
gid_t gid,egid,sgid,fsgid;
int ngroups; //记录进程在多少个用户组中
gid_t groups[NGROUPS]; //记录进程所在的组
//进程的权能,分别是有效位集合,继承位集合,允许位集合
kernel_cap_t cap_effective, cap_inheritable, cap_permitted;
int keep_capabilities:1;
struct user_struct *user;
struct rlimit rlim[RLIM_NLIMITS];  //与进程相关的资源限制信息
unsigned short used_math;   //是否使用FPU
char comm[16];   //进程正在运行的可执行文件名
 //文件系统信息
int link_count, total_link_count;
//NULL if no tty 进程所在的控制终端,如果不需要控制终端,则该指针为空
struct tty_struct *tty;
unsigned int locks;
//进程间通信信息
struct sem_undo *semundo;  //进程在信号灯上的所有undo操作
struct sem_queue *semsleeping; //当进程因为信号灯操作而挂起时,他在该队列中记录等待的操作
//进程的CPU状态,切换时,要保存到停止进程的task_struct中
struct thread_struct thread;
  //文件系统信息
struct fs_struct *fs;
  //打开文件信息
struct files_struct *files;
  //信号处理函数
spinlock_t sigmask_lock;
struct signal_struct *sig; //信号处理函数
sigset_t blocked;  //进程当前要阻塞的信号,每个信号对应一位
struct sigpending pending;  //进程上是否有待处理的信号
unsigned long sas_ss_sp;
size_t sas_ss_size;
int (*notifier)(void *priv);
void *notifier_data;
sigset_t *notifier_mask;
u32 parent_exec_id;
u32 self_exec_id;

spinlock_t alloc_lock;
void *journal_info;
};​​

The task_struct structure is a data structure in the Linux kernel. Next, let’s get to the focus of this article: analyzing the task_struct structure.

(1) Process identifier PID (process identifier):

  • pid_t pid;//The unique identifier of the process
  • pid_t tgid;//The value of the pid member of the thread group's leading thread

32-bit unsigned integer data. But the maximum value is 32767. Represents the identifier of each process. It is also an excuse provided by the kernel to user programs, which operate programs through pid. Because of Unix, the concept of thread groups was also introduced. Called: tgid. The pid used by all threads in a thread group and the first lightweight thread in the thread group is stored in the tgid member. When the process has no threads, tgid=pid; when there are multiple threads, tgid represents the id of the main thread, and pid represents each thread's own id.

(2) The state of the process volatile long state

Possible values ​​for state are:

  • #define TASK_RUNNING 0//The process is either executing or preparing to execute

  • #define TASK_INTERRUPTIBLE 1 //Interruptible sleep, which can be woken up by a signal

  • #define TASK_UNINTERRUPTIBLE 2 //Cannot interrupt sleep and cannot wake up through signals

  • #define __TASK_STOPPED 4 //The process stops execution

  • #define __TASK_TRACED 8 //The process is tracked

/* in tsk->exit_state */

  • #define EXIT_ZOMBIE 16 //A process in zombie state means that the process has been terminated, but the parent process has not yet obtained its termination information, such as whether the process has completed execution and other information.

  • #define EXIT_DEAD 32 //The final state of the process, the process dies

/* in tsk->state again */

  • #define TASK_DEAD 64 //Death

  • #define TASK_WAKEKILL 128 //Wake up and kill the process

  • #define TASK_WAKING 256 //Wake up the process

(3) The priority of the process long priority

The value of Priority gives the time (in jiffies) that the process can use after each acquisition of the CPU. The priority can be changed through the system sys_setpriorty (in kernel/sys.c).

  • Program counter: The address of the next instruction to be executed in the program.
  • Memory pointers: including pointers to program code and process-related data, as well as pointers to memory blocks shared with other processes.
  • Context data: Data in the processor's registers while the process is executing.
  • I/O status information: includes displayed I/O requests, I/O devices assigned to the process (such as tape drives), and a list of files used by the process.
  • Audit information: can include the total processor time, the total number of clocks used, time limits, audit numbers, etc.

(4) Process scheduling information

Indicates the time that the current process or a process is allowed to run. When the time slice of the process ends, the CPU will take out another process from the run queue to run.

  • need_resched: scheduling flag
  • Nice: static priority
  • Counter: dynamic priority; when the process is rescheduled, the process with the largest Counter value will be selected in the run_queue. It also represents the time slice of the process, which continues to decrease during operation.
  • Policy: The value assigned when the scheduling policy starts running
  • rt_priority: real-time priority

(5) Information related to process communication (IPC: Inter_Process Communication)

  • unsigned long signal: The signal received by the process. Each bit represents a signal, a total of 32 types. Set to take effect.
  • unsigned long blocked: Bit mask of signals that the process can accept. Setting it means shielding, and resetting it means not shielding.
  • Spinlock_t sigmask_lock: signal mask spin lock
  • Long blocked: signal mask
  • Struct sem_undo *semundo: Cancellation operation set on the semaphore to avoid deadlock
  • Struct sem_queue *semsleeping: Waiting queue related to semaphore operations
  • struct signal_struct *sig: signal processing function

(6) Process information

There are multiple processes in Linux, and the relationship between processes in multiple processes may be a father-son relationship or a brother relationship.
  Except for the ancestor process, other processes have a parent process, and child processes are created through folk to execute the program. In addition to representing their respective pids, most of the information of the child process copies the information of the parent process. And the parent process holds the power of life and death over the child process, that is, the child process is created by the parent process, and the parent process can also send commands to kill the child process.

(7) Time information

  • Start_time: process creation time
  • Per_cpu_utime: The time spent in user mode when the process is executing.
  • Pre_cpu_stime: The time the process spends in the system state when executing.
  • ITIMER_REAL: Real-time timer, updated in real time regardless of whether the process is running.
  • ITIMER_VIRTUAL: Virtual timer, it will be updated only when the process is running in user mode.
  • ITIMER_PROF: Overview timer, updated when the process is running in user mode and system mode.

(8)File information

Opening and closing files is an operation of resources. There are two structures in task_struct in Linux to store these two information.

  • Sruct fs_struct *fs: The file system where the executable image of the process is located has two index points, called root and pwd, which point to the corresponding root directory and the current directory respectively.

  • Struct files_struct *files: files opened by the process

(8) Address space/virtual memory information

Each process has its own virtual memory space, represented by mm_struct. Two pointers are used in mm_struct to represent a section of virtual address space, and then it is finally mapped to real physical memory through the page table.

(9) Page management information

  • Int swappable: Whether the memory page occupied by the process can be swapped out.
  • Unsigned long min_flat,maj_flt,nswap: The cumulative number of pages swapped out and swapped in by the process.
  • Unsigned long cmin_flat, cmaj_flt, cnswap: This process is the ancestor process, and the cumulative number of pages swapped out and swapped in at all levels of its subprocesses.

(10) Symmetric pair processor information

  • Int has_cpu: Whether the process currently owns the CPU
  • Int processor: The CPU currently being used by the process
  • Int lock_depth: The depth of the kernel lock when context switching

(11)Context information:

  • struct desc_struct *ldt: Pointer to the local descriptor table of the process for CPU segment storage management.
  • struct thread_struct tss: task status segment. Interact with Intel's TSS. The currently running TSS is saved in the PCB's tss, and the tss of the newly selected process is saved in the TSS.

(12) Semaphore data members

  • struct sem_undo *semundo: Every time the process operates a semaphore, an undo operation is generated. It is saved in the sem_undo structure. Finally, when the process terminates abnormally, the member semadj of sem_undo will point to an array. Each member in this array represents the amount of each previous undo.
  • truct sem_queue *semsleeping: When a process operates a semaphore and causes congestion, the process will be sent to the sem_queue queue for the semaphore indicated by semsleeping.

(13) Process queue pointer

  • struct task_struct *next_task, *prev_task: All processes have their own PCB. And each PCB will be strung together to form a two-way linked list. Its next_task and + prev_task represent the previous or next PCB, that is, the front and rear pointers. The head and tail of the process list are both process number 0.

  • struct task_struct *next_run, *prev_run: It is generated by the run_queue of the process and points to the previous or next runnable process. The head and tail of the linked list are process No. 0.

  • struct task_struct *p_opptr: original parent process (ancestor process)

  • struct task_struct *p_pptr: parent process

  • struct task_struct *p_cptr: child process

  • struct task_struct *p_ysptr: younger process

  • struct task_struct *p_osptr: Brother process
      The above are pointers to the original parent process (original parent), parent process (parent), youngest child process (youngest child) and new and old sibling processes (younger sibling, older sibling).

  • current: Pointer to the currently running process.

  • struct task_struct init_task: PCB of process No. 0, the root of the process is always INIT_TASK.

  • char comm[16]: The file name of the executable file being executed by the process.

  • int errno: The error number of the last error in the process. 0 means no error.

References

  • https://blog.51cto.com/u_15861560/5822391
  • https://zhuanlan.zhihu.com/p/60046486
  • https://blog.csdn.net/qq_41209741/article/details/82870876

Guess you like

Origin blog.csdn.net/weixin_45264425/article/details/132756652