说明

进程管理相关的系统调用通常不是由应用程序直接调用的，而是使用了C标准库这样的中间层。

进程的创建最重要的是一个复制父进程到子进程的过程。

进程复制

Linux实现了3个系统调用用于进程复制。

fork：重量级调用，它建立父进程的完整副本；

vfork：类似于fork，但并不创建父进程数据的副本，而是与父进程共享数据。为了满足这个要求，子进程在退出或者开始新程序之前内核保证父进程处于堵塞状态；

clone：产生线程，可以对父子进程之间的共享、复制进行精确控制；clone使用的细粒度的资源分配扩展了一般的线程概念，在一定程度上允许线程与进程之间的连续转换；事实上在Linux中，线程和进程之间的差别不是那么刚性，，这两个名词经常用作同义词；

另外最重要的是，Linux使用了写时复制（Copy-On-Write，COW）技术，它使父进程的数据不会直接复制到子进程，而是父子进程的地址空间指向同样的物理内存，这些内存的属性被设置成只读。当一个进程试图向复制的内存写入，处理器会向内核报告“缺页异常”，内核会创建该页专用于当前进程的副本来进行写操作。

上述系统调用的入口分别适合sys_fork、sys_vfork和sys_clone，它们是平台相关的，以x86为例（位于arch\x86\kernel\process_64.c）：

asmlinkage long sys_fork(struct pt_regs *regs)
{
	return do_fork(SIGCHLD, regs->rsp, regs, 0, NULL, NULL);
}

asmlinkage long
sys_clone(unsigned long clone_flags, unsigned long newsp,
	  void __user *parent_tid, void __user *child_tid, struct pt_regs *regs)
{
	if (!newsp)
		newsp = regs->rsp;
	return do_fork(clone_flags, newsp, regs, 0, parent_tid, child_tid);
}

/*
 * This is trivial, and on the face of it looks like it
 * could equally well be done in user mode.
 *
 * Not so, for quite unobvious reasons - register pressure.
 * In user mode vfork() cannot have a stack frame, and if
 * done by calling the "clone()" system call directly, you
 * do not have enough call-clobbered registers to hold all
 * the information you need.
 */
asmlinkage long sys_vfork(struct pt_regs *regs)
{
	return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs->rsp, regs, 0,
		    NULL, NULL);
}

实际上它们都走到了平台无关的函数do_fork()。

/*
 *  Ok, this is the main fork-routine.
 *
 * It copies the process, and if successful kick-starts
 * it and waits for it to finish using the VM if required.
 */
long do_fork(unsigned long clone_flags,
	      unsigned long stack_start,
	      struct pt_regs *regs,
	      unsigned long stack_size,
	      int __user *parent_tidptr,
	      int __user *child_tidptr)

关于这个函数的实现还是直接看书。

内核线程

内核线程是直接由内核本身启动的进程，通过如下的接口创建：

/*
 * create a kernel thread without removing it from tasklists
 */
extern long kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);

而它的实现，底层调用的还是do_fork：

pid_t
kernel_thread (int (*fn)(void *), void *arg, unsigned long flags)
{
	extern void start_kernel_thread (void);
	unsigned long *helper_fptr = (unsigned long *) &start_kernel_thread;
	struct {
		struct switch_stack sw;
		struct pt_regs pt;
	} regs;

	memset(&regs, 0, sizeof(regs));
	regs.pt.cr_iip = helper_fptr[0];	/* set entry point (IP) */
	regs.pt.r1 = helper_fptr[1];		/* set GP */
	regs.pt.r9 = (unsigned long) fn;	/* 1st argument */
	regs.pt.r11 = (unsigned long) arg;	/* 2nd argument */
	/* Preserve PSR bits, except for bits 32-34 and 37-45, which we can't read.  */
	regs.pt.cr_ipsr = ia64_getreg(_IA64_REG_PSR) | IA64_PSR_BN;
	regs.pt.cr_ifs = 1UL << 63;		/* mark as valid, empty frame */
	regs.sw.ar_fpsr = regs.pt.ar_fpsr = ia64_getreg(_IA64_REG_AR_FPSR);
	regs.sw.ar_bspstore = (unsigned long) current + IA64_RBS_OFFSET;
	regs.sw.pr = (1 << PRED_KERNEL_STACK);
	return do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, &regs.pt, 0, NULL, NULL);
}

另一个创建内核线程的是kthread_create：

/**
 * kthread_create - create a kthread.
 * @threadfn: the function to run until signal_pending(current).
 * @data: data ptr for @threadfn.
 * @namefmt: printf-style name for the thread.
 *
 * Description: This helper function creates and names a kernel
 * thread.  The thread will be stopped: use wake_up_process() to start
 * it.  See also kthread_run(), kthread_create_on_cpu().
 *
 * When woken, the thread will run @threadfn() with @data as its
 * argument. @threadfn() can either call do_exit() directly if it is a
 * standalone thread for which noone will call kthread_stop(), or
 * return when 'kthread_should_stop()' is true (which means
 * kthread_stop() has been called).  The return value should be zero
 * or a negative error number; it will be passed to kthread_stop().
 *
 * Returns a task_struct or ERR_PTR(-ENOMEM).
 */
struct task_struct *kthread_create(int (*threadfn)(void *data),
				   void *data,
				   const char namefmt[],
				   ...)

启动新程序

复制进程之后，用新代码替换现存程序，即可启动新程序。

Linux使用execve系统调用来完成这个操作。

同样execve的入口点对应sys_execve函数：

long
sys_execve (char __user *filename, char __user * __user *argv, char __user * __user *envp,
	    struct pt_regs *regs)
{
	char *fname;
	int error;

	fname = getname(filename);
	error = PTR_ERR(fname);
	if (IS_ERR(fname))
		goto out;
	error = do_execve(fname, argv, envp, regs);
	putname(fname);
out:
	return error;
}

这个是平台相关的，而对应的do_execve是平台无关的。

关于do_execve()的实现，也还是看书。

退出进程

退出进程使用系统调用exit，它的入口点事sys_exit：

asmlinkage long sys_exit(int error_code)
{
	do_exit((error_code&0xff)<<8);
}

它是跟平台无关的。

《深入Linux内核架构》读书笔记006——进程管理相关的系统调用

说明

进程复制

内核线程

启动新程序

退出进程

猜你喜欢