The connection between multi-core CPU, multi-process, multi-thread

foreword

        When introducing multithreading, we first analyze the process, and the multiprogramming model. A process is one of the most important abstractions in an operating system, enabling (pseudo) concurrency, i.e. turning a single CPU into multiple virtual CPUs, even on machines with only one CPU.

(Visible: multi-process is generated for the concurrency of CPU)

multiprogramming

     In the process of process execution, it is often blocked or interrupted due to resource requests or IO (some requests or interrupts need to be processed for a long time), and the CPU will be idle at this time. As we all know, the CPU is a very valuable resource in the computer. Utilization, the operating system needs to switch the process to hand over the CPU to a process in the ready queue for use; after the last block or the mid-end process meets the execution conditions again (usually the requested resources are satisfied or the IO is completed, etc.), The operating system then passes the CPU to the process again through the scheduling algorithm. There are various process scheduling algorithms of the operating system (for example, the process scheduling algorithm in the interactive system (windows) includes time slice round-robin scheduling, priority scheduling, I believe everyone is familiar with multi-level queue scheduling, etc. , and will not be further elaborated here), and the specific choice depends on the operating system. At this time, from the user's point of view, it feels that multiple processes are executing at the same time. It is easy for the operating system to turn one CPU into multiple virtual CPUs through process scheduling to achieve pseudo-concurrency of multiple processes. This switching between processes is called "multiprogramming" in "Operating System Now".

(Note: Each process shows a different form during the execution process, and the CPU needs to formulate a CPU scheduling algorithm according to the characteristics of the process execution process to improve CPU utilization)

      Process switching: Due to the heterogeneity of the processing tasks of each process, the input, output, processing process, and processing status of the process are different. Should these parameters be considered in the process of process switching? The answer is yes. An executing process includes program counters, registers, the current value of variables, etc., and these data are stored in the registers of the CPU, and these registers can only be enjoyed by the process that is using the CPU, so during process switching, First, you must save the data of the previous process (so that the next time you get the right to use the CPU, you can continue the sequential execution from the last interrupt, instead of returning to the beginning of the process, otherwise the tasks processed each time the process regains the CPU are The last repetition may never reach the end of the process, because it is almost impossible for a process to perform all tasks before releasing the CPU), and then load the data of the process that obtained the CPU this time into the registers of the CPU from the last time. The rest of the task continues at the breakpoint. In order to facilitate the management of the internal processes of the system, the operating system creates a process table entry for each process, as shown in Table 1.

(Note: The concurrent requirements of the CPU generate processes, and the characteristics of the process execution process generate the CPU scheduling algorithm. The CPU scheduling needs to maintain the unique data and space of each process, so the process table entry for maintaining each process is generated)

 

Process management

Storage management

file management

register

text segment pointer

Root directory

program counter

data segment pointer

Work list

program status word

stack pointer

file descriptor

stack pointer

 

User ID

process status

 

group id

priority

 

 

Scheduling parameters

 

 

Process ID

 

 

parent process

 

 

process group

 

 

Signal

 

 

Process start time

 

 

CPU time

 

 

CPU time of child process

 

 

next alarm time

 

 

The review of the basic knowledge of multiprogramming is almost done, let’s discuss the multiprogramming model.

From the above analysis, we can know that multiprogramming can improve the utilization of CPU. But strictly speaking, if the average calculation time of the process is 20% of the time the process stays in memory, and there are 5 processes in the memory at the same time, the CPU will always be at full load, but the simulation is too optimistic in practice, because It assumes that these 5 processes will not be waiting for IO at the same time.

Next, we analyze the CPU utilization from the perspective of probability. Assuming that the ratio of the time the process waits for IO operations to the time it stays in memory is p, when there are n processes in the memory, the probability of n processes waiting for IO at the same time is p n . Then the CPU utilization is:

CPU utilization = 1 - p n


Figure 1 shows the CPU utilization with a function of n as a variable, where n is the number of multi-programming channels.

    It can be seen from the figure that when a process spends 80% of its time on IO (IO-intensive), it takes about 10 concurrent processes to make full use of the CPU; and when a process only spends 20% of its time on IO (called CPU-intensive) ), the child needs 2 processes to make the CPU waste rate less than 10%. In practical applications, 80% or more of IO time is common for an interactive process waiting for user input from the terminal or a server process that does a lot of reading and writing disks. The CPU utilization of the process indirectly improves the throughput of the entire system; for CPU-intensive processes, the concurrency and CPU utilization are not necessarily proportional. This standard system is also applicable to multithreading, and we will analyze it further later.

    The multiprogramming design is reviewed here. It can be seen that its core is to improve CPU utilization through process scheduling, virtualize one CPU into multiple, and realize the concurrent execution of multiple processes . As for how processes are created, destroyed, and state and state transition , the hierarchy of the process and the implementation of the process will not be elaborated here. If you can't answer it or can't just pick it up, go back and read the book "Modern Operating System".

 (Note: Processes can be divided into I/O-intensive and CPU-intensive types. Set the process strategy according to the different characteristics of the process to improve CPU utilization. The method is not static)

    Let's talk about thread related knowledge and the relationship with process. When you read this, some of the same shoes will definitely ask, since multiprogramming can improve CPU utilization and achieve concurrent execution of multiple processes. If you are asking this question, it means you are thinking. If you are not aware of this problem, I suggest you stop and think about it by combining the knowledge you have learned before, and see if you can come up with an answer yourself.

thread

     Let's take a look at Wikipedia's definition of a thread: A thread is the smallest unit that an operating system can schedule operations on. It is included in the process and is the actual unit of operation in the itinerary. A thread refers to a single sequential control flow in a process. A process can run multiple threads in parallel, and each thread executes different tasks in parallel. It is also called lightweight processes in Unix System V and SunOS, but lightweight processes are more referred to as kernel threads, and user threads are called threads. In addition, from the perspective of resource allocation, a process is the basic unit of resource allocation for all resources, and a thread is the basic unit of CPU scheduling, even in a single-threaded process.

 

Reasons for introducing threads:

1) There are multiple tasks in an application at the same time, some of the activities will block over time, while others will not, for example, a word processing software, the foreground part needs to get input from the terminal device or will process it. part of the output, and the background thread can realize the processing of the text. Therefore, for CPU-intensive processes, the performance of multithreading may not be greatly improved, but for IO-intensive processes, its performance can be greatly improved.

2)   Threads are more lightweight than processes, and the cost of creation and cancellation is small. In many systems, creating a thread is 10 to 100 times faster than a process .

3) In multi-core CPUs, true parallelism is possible. That is, in the multi-threaded design, part of it can be used to process foreground tasks, and part of it can be used to process background tasks to achieve true parallelism.

4) The cost of switching between threads is smaller than the cost of process switching.


Reasons for introducing multithreading:

1) An operation may be caught in a long wait, and the waiting thread will enter a sleep state and cannot continue to execute. Multithreaded execution makes efficient use of latency. Such as waiting for a network response may take several seconds.

2) An operation (often computation) consumes a lot of time, and if there is only one thread, the interaction between the program and the user will be interrupted. Multithreading allows one thread to be responsible for delivery and another thread for computation.

3) A multi-CPU or multi-core computer has the ability to execute multiple threads at the same time, so a single thread cannot fully utilize the computing power of the computer.

4) Compared with multi-process applications, multi-threading is much more efficient in data sharing.

5) The program logic itself requires concurrent operations.


Now we can see the benefits of multithreading more clearly by examining an example.

    Suppose the user is editing a book. For editors, the easiest way is to treat the original book as a file, which is easy to edit; for computers, it is faster to process each chapter as a file, but it is too troublesome for editors to modify , because some modifications are not only designed for one chapter but the entire book, such as replacing a certain word or character in the entire book, etc. If the entire book is used as a file, it is much more convenient to deal with it as it is. Otherwise, the file where each chapter is located will have to be processed.

     Now if a word in a certain line of the first page is deleted in a 1000-page document, in order to ensure the correctness of the format, the word processing software needs to format the document. But at this time, the user needs to jump to the ground 800 and make another modification, so the word processing software is forced to process the first 800 pages of the entire book, because the word processing software does not arrange all the pages before the page. Know where the first line on page 800 should be. And before the 800th page can be displayed on the screen, the computer may have to delay processing for a long time, making the user unsatisfied.

This is where multithreading comes into play. Suppose word processing software is written as a program with two threads. One thread handles user interaction and the other is used for formatting in the background. As soon as the first page is modified, the interactive thread recreates the formatting of the entire book simultaneously with the background formatting thread. Meanwhile, the interactive thread continues to monitor the user's mouse, keyboard, and respond to simple commands such as the first page. At the moment, the background thread is doing crazy calculations, and with luck, the formatting may be at the user's request to view page 800. done before so the user doesn't feel the delay.

Similarly, in order to ensure that the user's editing work is saved in time, a site can be added to periodically enter the file into the site to process disk backup without interfering with the other two threads. The situation with three threads is shown in Figure 2.

Figure 2 Word processing software with three threads

Just imagine, if it is a single thread, then during the disk backup, commands from the keyboard or mouse will be ignored until the backup is completed. Some of the same shoes will say that an interrupt mechanism can be introduced to suspend the backup operation, corresponding to the commands of the mouse and keyboard, but its complexity can be imagined. If three threads are introduced, the design is much simpler. One thread is used to interact with the user, the second thread formats the document in the background after being notified by the other thread, and the third thread periodically updates the ARM The contents are sealed to disk.

Here, it is obvious that three different processes cannot work here, because all three threads need to operate on the same file. With three threads, since all threads in a process share common memory, so can be processed on the same file. Similarly, other interactive programs can also use the same design method.

      After reading this example, some students who are more loyal to multi-programming may ask that the above three threads can work together, and the main convenience is that they share the common memory space in the process. Similarly, can also use process communication to complete work collaboratively? The answer is indeed so, but carefully consider several questions: 1. Which is the higher cost of process communication and thread communication? 2. The cost of process switching and thread switching? 3. How to ensure the consistency of the content processed by the three processes? In the multi-threading scheme, the consistency guarantee is much simpler because the content of the same document is processed. If you can accurately answer the above answers, why not choose a multi-programming scheme to complete the above work will be self-defeating.

We have almost reviewed the content of threads. Now let's take a look at the relationship and difference between processes and threads.

Process and thread:

     A process is the management unit of the operating system, and a thread is the management unit of a process; a thread contains at least one thread of execution. Whether in a single thread or multiple threads, each thread has a program counter ( recording the next instruction to execute), a set of registers (holding the working variables of the current thread), stack (recording the execution history, where each frame saves a procedure that has been called but returned). Although a thread is parasitic in a process, it is a different concept from his process and can be handled separately: a process is the basic unit for system allocation of resources, and a thread is the basic unit for scheduling CPU .

     Multithreading is an emulation of multiple processes. In the former, multiple threads share the same address space and other resources, and the latter share other resources such as physical memory, disk, IO, etc., so threads are called "lightweight processes". When multithreading is running in the CPU system, the threads run in turn, just like multiprogramming, creating the illusion that the threads are running in parallel. In a process with three CPU bounds, each thread actually gets a third of the real CPU speed on one CPU. However, with the development of technology, the current mainstream CPUs have directly supported multi-threading in hardware, and allow threads to complete switching within a few nanoseconds. The relationship between multi-process, multi-thread, and multi-core will be summarized later.

      There is not as much independence between threads as between processes. Multiple threads of a process share many resources within the process. Threads can write each other's stacks, but different processes cannot access the address spaces of other processes. write operation. Therefore, in the realization of multi-threaded programming, a reasonable synchronous communication mechanism should be designed to avoid the occurrence of data conflict. Figure 3 shows the content of processes and threads, where the content of a process is shared by all threads of the process.

content in each process

content in each thread

address space

program counter

global variable

register

open a file

stack

child process

condition

impending alarm

 

Signals and Signal Handlers

 

account signal

 

Synchronization, Mutex Semaphore

 

 This section is written here first, which explains multi-process and multi-threading

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324838822&siteId=291194637