本节主要编写一个内核模块，实现进程PID和名称的打印。这里，我们需要了解linux内核中，链表的实现、PCB（进程控制块）的定义。编程不是目的，目的是更加了解内核中相关代码的实现。至于如何编写和运行一个模块，请参考：在Ubuntu 18.04环境下编写一个简单的内核模块

遍历进程的模块源码

# include<linux/module.h>
# include<linux/kernel.h>
# include<linux/init.h>
# include<linux/list.h>
# include<linux/sched.h>

static int print_pid(void)
{
	struct task_struct * task, * p;
	struct list_head * pos;
	int count = 0;
	printk("Hello World enter begin:\n");
	task =& init_task;
	list_for_each(pos, &task->tasks)
	{
		p = list_entry(pos, struct task_struct, tasks);
		count++;
		printk("%d---------->%s\n", p->pid, p->comm);
	}
	printk("the number of process is: %d\n", count);
	return 0;
}

static int __init lkp_init(void)
{
	printk("<1>Hello, World! from the kernel space...\n");
	print_pid;
	return 0;
}

static void __exit lkp_cleanup(void)
{
	printk("<1>Good Bye, World! leaving kernel space..\n");
}

module_init(lkp_init);
module_exit(lkp_cleanup);
MODULE_LICENSE("GPL");

print_pid() 用来打印进程的PID和进程名，有同学可能对上一段代码不太了解，所以在这里，先抛出代码的作用就在于，可以根据不懂的地方，查漏补缺。

1.内核中链表的定义

我们知道，以双向链表作为基本的数据结构，可以演化出其他复杂的数据结构，如栈、队列等。Linux内核中，也是如此，当然，随着内核不断的优化，如今的内核版本，在设计链表时，处处体现其精妙绝伦的设计思想。

struct list_head
{
	struct list_head *next, *prev;
}

这是内核中链表的一段源码，整个链表的位置在ubuntu中如下所示：
链表定义的位置
对于内核中的链表，我们不对整个list.h进行分析，只是大体介绍一下linux链表设计的思想。
在内核中，上述结构体是一个不含数据域的链表，可以嵌入任何结构中，实现各种链表或者其他的数据结构。比如，下面的代码就嵌入了之前的内核链表，从而实现一个双向链表：

struct test_list
{
	void * my_data;
	struct list_head list;
}

在一个结构体中嵌套另外一个结构体，可以把它想象成递归。对于list.h的详细分析，会在以后给与详细的解释。

2.进程控制块（PCB）

什么是PCB？

操作系统为了对进程进行管理，势必需要对进程生命周期中的每一个阶段进行详细描述，linux内核中就用一个统一的数据结构来描述进程的状态，这样的数据结构我们称之为进程控制块PCB（Process Control Block）

内核中的PCB是一个相当庞大的结构体，主要包含状态信息、链接信息、各种标识符、进程间通信、时间和定时器信息、调度信息、文件系统信息、虚拟内存信息和处理器环境等。其在内核中的位置：
PCB位置
关于详细的PCB分析，日后再进行介绍。

3.进程链表

为了对给定类型的进程进行有效搜索，内核建立了几个进程链表。每个链表由指向进程PCB的指针组成。在sched.h文件的task_struct结构体中，有如下代码，实现进程链表。

struct task_struct
{
	struct list_head tasks;
	char comm[TASK_COMM_LEN];	// 带有路径的可执行程序的名字
}

其实就是通过双向循环链表把所有进程联系起来¹。

双向循环链表

扫描二维码关注公众号，回复： 9232572 查看本文章

链表头尾都是init_task。init_task是0号进程的PCB，0号进程永远不会被撤销，它的PCB被静态分配到内核数据段中。换句话说，init_task的PCB是预先由编译器分配，在运行过程中保持不变。对于其他PCB，在运行中，根据系统状态随机分配，决定是否分配和撤销。

至此，我们已经讲解了必要的基础知识，回过头来，再分析遍历进程的源码，其实就很简单了。

4.分析遍历进程的模块

static int print_pid(void)
{
	struct task_struct * task, * p;
	struct list_head * pos;
	int count = 0;
	printk("Hello World enter begin:\n");
	task =& init_task;
	list_for_each(pos, &task->tasks)
	{
		p = list_entry(pos, struct task_struct, tasks);
		count++;
		printk("%d---------->%s\n", p->pid, p->comm);
	}
	printk("the number of process is: %d\n", count);
	return 0;
}

list_for_each(pos, &task->tasks)
@pos 是指向struct list_head类型的指针，故它可以指向进程描述符中的children（tasks）或者sibling
@task 指当前进程，tasks指其自进程

Linux系统中的每个进程都有一个父进程（init进程除外）；每个进程还有0个或多个子进程。在进程描述符中parent指针指向其父进程，还有一个名为children的子进程链表（父进程task_struct中的children相当于链表的表头）。

而我们可以使用list_for_each(/include/linux/list.h)来依次遍历访问子进程：

struct task_struct *task;
struct list_head *list;
list_for_each(list, &current->children) 
{
      task = list_entry(list, struct task_struct, sibling);
}

其中task即为某个子进程的地址

task_struct中的children指针指向其某个子进程的进程描述符task_struct中children的地址
而非直接指向某个子进程的地址
子进程链表中存放的仅仅是各个task_struct成员children的地址

我们查看源文件找到list_for_each的定义：

#define list_for_each(pos, head) \
    for (pos = (head)->next; prefetch(pos->next), pos != (head); pos = pos->next)

从上可以看出list_for_each其实就是一个for循环，for()实现的就是一个children链表的遍历，而由children的地址如何取到task_struct的地址呢，它是由list_entry宏来实现的。知道父进程children的地址，如何得到子进程的task_struct地址，这是目的我们先给出所需函数或宏的源代码

//list_entry(/include/linux/list.h)
//task = list_entry(list, struct task_struct, sibling); 加入做对比

#define list_entry(ptr, type, member) \
    container_of(ptr, type, member)
//------------------------------------    
//container_of(include/linux/kernel.h)

#define container_of(ptr, type, member) ({          \
    const typeof( ((type *)0)->member ) *__mptr = (ptr);    
    (type *)( (char *)__mptr - offsetof(type,member) );
 })

//------------------------------------    
//offsetof(/include/linux/stddef.h)
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
 ((size_t) &((TYPE *)0)->MEMBER)

对于list_entry宏来说ptr在这里为指向children链表的指针，type为task_struct结构体的类型，member为链表成员的变量名，即children。

container_of()思路为先求出结构体成员member(即children)在结构体(即task_struct)中的偏移量，然后再根据member的地址(即ptr)来求出结构体(即task_struct)的地址。

((type *)0)->member，他将地址0强制转换为type类型的指针，然后再指向成员member，此时((type *)0)->member的地址即为member成员相对于结构体的位移。其中typeof()相当于C的sizeof()，(char *)__mptr这个强制转换用来计算偏移字节量，size_t被定义为unsigned int 类型。

《Linux操作系统原理与应用》. ↩︎

姑苏城外的江枫

发布了23 篇原创文章 · 获赞 22 · 访问量 4万+

私信关注

编写一个遍历进程的内核模块

遍历进程的模块源码

1.内核中链表的定义

2.进程控制块（PCB）

3.进程链表

4.分析遍历进程的模块

猜你喜欢