进程(process)和线程(thread)介绍

本文主要介绍进程和线程的相关知识。

0 why

0.1 为了整合资源

一开始,CPU 只有在执行完一份完整的任务代码后,才能执行下一份。后来,通过把 CPU 时间分片,可以让多个任务“看似”同时地执行起来。

为了更好地区分这些“同时”执行的任务以及整合各自资源,人们就提出了 process 这个概念。如下:

A process is basically a program in execution… It is fundamentally a container that holds all the information needed to run a program.

每个 process 都有独立的下列资源:

  • address space:a list of memory locations from 0 to some maximum, which the process can read and write.
  • resource:commonly including registers (including the program counter and stack pointer), a list of related processes, and all the other information needed to run the program.

process之间要的通信要通过 IPC(inter-process communication)来实现。

0.2 为了提高效率

原始的 process 只有一个“thread of control”来执行任务,后来人们发现如果一个 process 中能够有“multiple threads of control”,让它们共享 process 资源并相互协作,将会大大提高效率。由此,人们提出了 thread 这个概念。

每个 thread 都拥有自己 stack,用来记录执行历史。

正如前面所说,为了提高效率,threads 之间共享 process 的 address space 和 resource。由于 address space 共享,thread A 可以几乎毫无障碍地修改 thread B stack 上的数据。

为什么不在 thread 之间设置一定的保护(其实在多进程场景设计时,需要考虑进程同步与死锁的问题,这里的保护指的是其他的保护措施)呢?

Unlike different processes, which may be from different users and which may be hostile to one another, a process is always owned by a single user, who has presumably created multiple threads so that they can cooperate, not fight.

为什么不用多进程(multi-processes)、而是使用多线程(multi-threads)来协作呢?

…they are lighter weight than processes, they are easier (i.e., faster) to create and destroy than processes. In many systems, creating a thread goes 10-100 times faster than creating a process.

而且,process 之间的资源共享和信息传递(IPC)不如 thread 高效(共享 address space 和 resource)。

0.3 总结

process 这个模型体系由两个独立的概念组成:resource grouping 和 execution。

resource grouping:One way of looking at a process is that it is a way to group related resources together. A process has an address space containing program text and data, as well as other resource. These resources may include open files, child processes, pending alarms, signal handlers, accounting information, and more. By putting them together in the form of a process, they can be managed more easily.

execution:The other concept a process has is a thread of control, usually shortened to just thread. The thread has a program counter that keeps track of which instruction to execute next. It has registers, which hold its current working variables. It has a stack, which contains the execution history, with one frame for each procedure called but not yet returned from.

process 和 thread 虽然联系紧密,但从概念上区分的话,可以这么认为:

processes are used to group resources together; threads are the entities scheduled for execution on the CPU.

1 program

在有些文章中,在介绍 process 和 thread 之前,会首先介绍 program,本文也简要介绍一下 program。

有些书籍中把 program 称为程式,本文也借用这种叫法。

program 指的是软件工程师在 IDE、editor 等软件中编写的代码(code),即尚未载入计算机内存的代码。

假如把软件工程师比作建筑师,建筑师要设计一座工厂,而这座工厂如何建造、规划的蓝图就是 program。

说明:

  • 同一个 program 可以同时存在多个 process。

2 process

process,可翻译为程序、进程,指的是已经执行并且载入到内存中的 program,process 中的每一行 program 代码随时都可能被 CPU 执行。

在实际应用中,打开应用程序实际上就是将 program 活化成 process,所以我们可以查看到 PID,即执行中的 process。

结合前面所讲的 program 的假设,process 就是实体的工厂,按照 program 这张蓝图所完成的工厂。

关于 process,进行以下几点说明:

  • process 是电脑中已经执行的、program 的实体;
  • 每一个 process 都是相互独立的;
  • process 本身不是基本执行单位,它是 thread(基本执行单位) 的容器;
  • process 需要一些资源(如 CPU、内存、文件、I/O设备)才能完成工作;
  • 在多任务系统(Multitasking Operating System)中,可以同时执行多个 process,然而一个 CPU 一次只能执行一个 process(所以才出现了多核处理器),但是在实际的操作系统中,process 的运行数量肯定会多于 CPU 的总数,同时 process 需要占用内存,所以操作系统需要考虑如何对这些 process 进行调度(Scheduling)。

3 thread

thread,可以翻译为执行绪、线程(通常使用此翻译)。在前面提到了 process 是 thread 的容器,在同一个 process 中会有很多个 thread,每一个 thread 负责某一项功能。

以聊天室 process 为例,通过使用多个 thread,我们可以实现,在接受对方传来的消息的同时,发送自己的消息给对方。

结合前面的 program、process 的假设,thread 就是工厂内的工人,用于确保工厂的各个功能,并且每个工人共享工厂内的所有资源。

关于 thread,进行以下几点说明:

  • 同一个 process 可以同时存在多个 thread;
  • 同一个 process 下的 Thread 共享 process 的资源,如内存、变量等,不同的 process 不能;
  • 在多线程(Multi threading)中,两个线程如果同时存取或改变全局变量(Golbal Variable),则可能发生同步(Synchronization)的问题。如果线程之间互抢资源,则可能产生死锁(Dead Lock)。在编写多线程程序时,需要注意这两种情况的发生。

4 区别与联系

首先上一张进程和线程之间联系的图,如下:


4.1 key different

Thread and Process are two closely related terms in multi-threading. The main difference between the two terms is that the threads are a part of a process, i.e. a process may contain one or more threads, but a thread cannot contain a process.

4.2 process具体描述

A process is an instance of a program that is being executed. It contains the program code and its current activity. Depending on the operating system, a process may be made up of multiple threads of execution that execute instructions concurrently. A program is a collection of instructions; a process is the actual execution of those instructions.

说明:可以对照前面所讲的 program 的概念,来理解这里的描述。

A process has a self-contained execution environment. It has a complete set of private basic run-time resources; in particular, each process has its own memory space. 

Processes are often considered similar to other programs or applications. However, the running of a single application may in fact be a set of cooperating processes. 

To facilitate communication between the processes, most operating systems use Inter Process Communication (IPC) resources, such as pipes and sockets. The IPC resources can also be used for communication between processes on different systems. 

Most applications in a virtual machine run as a single process. However, it can create additional processes using a process builder object.

4.3 thread具体描述

In computers, a thread can execute even the smallest sequence of programmed instructions that can be managed independently by an operating system. 

The applications of threads and processes differ from one operating system to another. However, the threads are made of and exist within a process; every process has at least one. 

Multiple threads can also exist in a process and share resources, which helps in efficient communication between threads.

On a single processor, multitasking takes place as the processor switches between different threads; it is known as multi-threading. The switching happens so frequently that the threads or tasks are perceived to be running at the same time. 

Threads can truly be concurrent on a multi-processor or multi-core system, with every processor or core executing the separate threads simultaneously.

4.4 总结

In summary, threads may be considered lightweight processes, as they contain simple sets of instructions and can run within a larger process. Computers can run multiple threads and processes at the same time.

4.5 进程和线程的对比

下面的表格在一些方面对进程和线程进行了对比。

 

process

thread

Definition

An executing instance of a program is called a process.

A thread is a subset of the process.

Data segment

It has its own copy of the data segment of the parent process.

It has direct access to the data segment of its process.

Communication

Processes must use inter-process communication to communicate with sibling processes.

Threads can directly communicate with other threads of its process.

Overheads

Processes have considerable overhead.

Threads have almost no overhead.

Creation

New processes require duplication of the parent process.

New threads are easily created.

Control

Processes can only exercise control over child processes.

Threads can exercise considerable control over threads of the same process.

Changes

Any change in the parent process does not affect child processes.

Any change in the main thread may affect the behavior of the other threads of the process.

Memory

Run in separate memory spaces.

Run in shared memory spaces.

File descriptors

Most file descriptors are not shared.

It shares file descriptors.

File system

There is no sharing of file system context.

It shares file system context.

Signal

It does not share signal handling.

It shares signal handling.

Controlled by

Process is controlled by the operating system.

Threads are controlled by programmer in a program.

Dependence

Processes are independent.

Threads are dependent.


猜你喜欢

转载自blog.csdn.net/liitdar/article/details/80991865