Linux high performance, high concurrency: go deep into the bottom of the computer, understand threads and thread pools

1. Overview of this article

This article is the beginning of a series of articles, mainly from the CPU level to explain the principles of multi-threading and thread pools, and strive to avoid the listing of complex technical concepts, and try to make it easy to understand and suitable for all ages.

2. Everything starts with the CPU

You may have questions. Why do you start with the CPU when talking about multithreading? The reason is simple. There are no fashionable concepts here, and you can see the essence of the problem more clearly.
The actual situation is: CPU does not know the concept of threads, processes, etc.
The CPU only knows two things:
1) fetch the instruction from the memory;
2) execute the instruction, and then return to 1).

Insert picture description here

You see, the CPU here does not know the concepts of processes, threads, etc.
The next question is where does the CPU fetch instructions? The answer comes from a register called Program Counter (PC for short), which is also known as the program counter. Here, don’t think about the register too mysteriously. You can simply understand the register as memory, but the access speed It's faster.
What is stored in the PC register? What is stored here is the address of the instruction in memory, what instruction? It is the next instruction to be executed by the CPU.

Insert picture description here

So who will set the instruction address in the PC register?
It turns out that the address in the PC register is automatically incremented by 1 by default, which of course makes sense, because in most cases the CPU executes sequentially one after another. When encountering if and else, this sequential execution is broken , CPU will dynamically change the value in the PC register according to the calculation result when executing this type of instruction, so that the CPU can jump to the instruction that needs to be executed correctly.
Smart, you must ask, how is the initial value set in the PC?
Before answering this question, we need to know where the instructions executed by the CPU come from? It comes from memory, nonsense, the instructions in the memory are loaded from the executable program saved on the disk. The executable program on the disk is generated by the compiler. Where does the compiler generate the machine instructions? The answer is the function we defined.

Insert picture description here

Note that it is a function. The instruction executed by the CPU is formed after the function is compiled. So naturally, how do we make the CPU execute a function? Obviously, we only need to find the first instruction formed after the function is compiled, and the first instruction is the function entry.
You should know by now, we want the CPU to execute a function, so we only need to write the address of the first machine instruction corresponding to the function into the PC register, so that the function we wrote will start to be executed by the CPU. .
You may have questions, what does this have to do with threads?

[Article benefits] The editor recommends my own linuxC/C++ language exchange group: 832218493. I have compiled some learning books and video materials that I think are better for sharing in it, and you can add them if you need it! ~!
Insert picture description here
More interview information and face-to-face videos are on the public account: Zero Sound Academy
Insert picture description here

3. From CPU to operating system

In the previous section, we understood the working principle of the CPU. If we want the CPU to execute a certain function, we only need to load the first machine execution corresponding to the function into the PC register, so that even without an operating system, we can let The CPU executes the program, although it is feasible but it is a very cumbersome process.
We need to:
1) find a suitable area in the memory to load the program;
2) find the function entry, set the PC register and let the CPU start executing the program.

These two steps are by no means so easy. If the programmer manually implements the above two processes each time the program is executed, it will be crazy, so smart programmers will want to write a program to automatically complete the above two. Steps.

Insert picture description here

Machine instructions need to be loaded into memory for execution, so you need to record the starting address and length of the memory; at the same time, you need to find the entry address of the function and write it to the PC register, and think about whether this needs a data structure to record this information .
The data structure is roughly as follows:

struct *** {
    
    
 void* start_addr;
 int len;
  
 void* start_point;
 ...
}; 

Next is the name moment.
This data structure must have a name. What information is this structure used to record? It records the running state of the program when it is loaded into the memory. What is it called when the program is loaded from the disk to the memory? Just call it Process. Our guiding principle is that it must sound mysterious. In short, it is not easy for everyone to understand it. I call it "the principle of not understanding."
In this way the process was born.
The first function executed by the CPU is also named. The first function to be executed sounds more important, so just call it the main function.
The program that completes the above two steps should also be named. According to the "do not understand the principle" this "simple" program is called the operating system (Operating System).
In this way, the operating system was born, and programmers no longer have to manually load it again if they want to run a program.
Now that the process and operating system are there, everything looks perfect.

4. From single core to multi-core, how to make full use of multi-core

One of the characteristics of human beings is that life is endlessly tossing, from single-core to multi-core.

Insert picture description here

At this time, what if we want to write a program and use multiple cores?
Some students may say that there is no process. Isn't it enough to open a few more processes?
It sounds reasonable, but there are mainly several problems:
1) Processes need to take up memory space (see this from the previous energy saving), if multiple processes are based on the same executable program, then these processes are other The content in the memory area is almost the same, which obviously will cause a waste of memory;
2) The tasks processed by the computer may be more complicated, which involves inter-process communication. Since each process is in a different memory address space, inter-process communication Naturally, the operating system is needed, which increases the system overhead while increasing the difficulty of programming.

what can we do about it?

5. From process to thread

Let me think about this problem carefully. The so-called process is nothing more than a section of memory. This section stores the machine instructions executed by the CPU and the stack information when the function is running. If you want the process to run, use the main function. The address of the first machine instruction is written into the PC register, so that the process runs.

Insert picture description here

The disadvantage of the process is that there is only one entry function, that is, the main function, so the machine instructions in the process can only be executed by one CPU, so is there a way for multiple CPUs to execute the machine instructions in the same process?
Smart, you should be able to think, since we can write the address of the first instruction of the main function into the PC register, what is the difference between other functions and the main function?
The answer is that there is no difference. The special feature of the main function is nothing more than the first function executed by the CPU. There is nothing special otherwise. We can point the PC register to the main function, and then the PC register can point to any one. function.
When we point the PC register to a non-main function, the thread is born.

Insert picture description here

So far we have liberated our minds. There can be multiple entry functions in a process, which means that machine instructions belonging to the same process can be executed by multiple CPUs at the same time.
Note: This is a different concept from a process. When creating a process, we need to find a suitable area in the memory to load the process, and then point the CPU's PC register to the main function, which means that there is only one execution flow in the process.

Insert picture description here

But now it is different. Multiple CPUs can execute multiple entry functions belonging to the process at the same time under the same roof (memory area occupied by the process), which means that there can be multiple execution streams in a process now.
Insert picture description here

It seems to be a bit too easy to understand that it is always called execution flow. Once again, I offer "do not understand the principle" and give it a name that is not easy to understand. Let's call it thread.
This is the origin of threads.
The operating system maintains a bunch of information for each process, which is used to record the memory space where the process is located. This bunch of information is recorded as data set A.
Similarly, the operating system also needs to maintain a bunch of information for the thread to record the entry function or stack information of the thread. This bunch of data is recorded as data set B.
Obviously the amount of data set B is less than that of data A. At the same time, unlike a process, there is no need to find a memory space in the memory when creating a thread, because the thread runs in the address space of the process in which it is located. This address space is in the program It has been created at startup, and the thread is created during the running of the program (after the process is started), so this address space already exists when the thread starts running, and the thread can be used directly. This is why the thread creation mentioned in various textbooks is faster than the creation process (of course there are other reasons).
It is worth noting that with the concept of threads, we only need to create multiple threads after the process is started to keep all CPUs busy. This is the root of the so-called high performance and high concurrency.
Insert picture description here

It's very simple, just create the right number of threads.
Another point worth noting is: because each thread shares the memory address space of the process, the communication between threads does not need to rely on the operating system, which brings great convenience to programmers and also brings endless troubles. Most of the problems encountered are due to the fact that the communication between threads is so convenient that it is very error-prone. The root of the error is that there is no concept of thread when the CPU executes instructions. The mutual exclusion and synchronization problems faced by multi-threaded programming need to be solved by the programmer. The problem of mutual exclusion and synchronization will not be explained in detail due to space limitations. Most of the operating system information All have detailed explanations.
The last thing to be reminded is that although multiple CPUs are used in the previous illustration about the use of threads, it does not mean that multiple cores must be used to use multiple threads. In the case of single core, multiple threads can be created because of threads. It is the implementation at the operating system level and has nothing to do with how many cores there are. When the CPU executes machine instructions, it does not realize which thread the executed machine instructions belong to. Even in the case of only one CPU, the operating system can make each thread move forward "simultaneously" through thread scheduling. The method is to allocate the time slice of the CPU back and forth between the threads, so that multiple threads seem to be "simultaneously" "It's running, but in fact there is only one thread running at any time.

6, threads and memory

In the previous discussion, we know the relationship between thread and CPU, that is, point the PC register of the CPU to the entry function of the thread, so that the thread can run. This is why we must specify an entry function when creating a thread.
Regardless of the programming language used, creating a thread is roughly the same:

// 设置线程入口函数DoSomething
thread = CreateThread(DoSomething);
 
// 让线程运行起来
thread.Run(); 

So what is the relationship between threads and memory?
We know that the data generated when a function is executed includes: function parameters, local variables, return addresses and other information. This information is stored in the stack. When the concept of thread has not yet appeared, there is only one execution flow in the process, so there is only one stack. The bottom of the stack is the entry function of the process, which is the main function.
Assuming that the main function calls funA, and funcA calls funcB, as shown in the figure:

Insert picture description here

What about threads?
With threads, there are multiple execution entries in a process, that is, there are multiple execution streams at the same time, then a process with only one execution flow needs a stack to save runtime information, then it is obvious that there are multiple execution flows. Multiple stacks to save the information of each execution flow, that is to say, the operating system must allocate a stack for each thread in the address space of the process, that is, each thread has its own stack. It is extremely important to be aware of this. Pivotal.

Insert picture description here

At the same time, we can also see that creating threads consumes process memory space, which is also worth noting.

7, the use of threads

Now that we have the concept of threads, how do we use threads as programmers next?
From the perspective of life cycle, there are two types of tasks to be processed by threads: long tasks and short tasks.
1) Long-lived tasks:
As the name suggests, tasks survive for a long time. For example, take our commonly used word as an example. The text we edit in word needs to be saved on the disk, and writing data to the disk is one Task, then a better method at this time is to create a thread that writes to the disk. The life cycle of the write thread is the same as the word process. As long as the word is opened, the write thread is created. When the user closes the word, the thread Will be destroyed, this is the long task.
Insert picture description here

This scenario is very suitable for creating dedicated threads to handle certain tasks. This situation is relatively simple.
There are long tasks, and corresponding short tasks.
2) Short-lived tasks:
The concept is also very simple, that is, the task processing time is very short, such as a network request, a database query, etc., this kind of task can be quickly completed in a short time. Therefore, short tasks are more common in various servers, such as web server, database server, file server, mail server, etc. This is also the most common scenario for students in the Internet industry. This scenario is what we want to focus on.
This scenario has two characteristics: one is the short time required for task processing; the other is the huge number of tasks.
What if you are asked to handle this type of task?
You might think, this is very simple. When the server receives a request, it creates a thread to process the task, and destroys the thread after the processing is complete. So easy.
This method is usually called thread-per-request, which means that a thread is created with one request:

Insert picture description here

If it is a long task, then this method can work very well, but for a large number of short tasks, this method is simple to implement but has disadvantages.
The specific disadvantages are as follows:
1) As we can see from the previous sections, thread is a concept in the operating system (we do not discuss user mode thread implementation, coroutines, etc.), so the creation of threads naturally requires the use of the operating system to complete. It takes time for the operating system to create and destroy threads;
2) Each thread needs to have its own independent stack, so when a large number of threads are created, excessive memory and other system resources will be consumed.

It's like you are a factory owner (think about it if you are happy), you have a lot of orders, and you have to recruit a group of workers for every batch of orders. The products produced are very simple and the workers can handle them quickly. After processing this batch of orders, dismiss these laborious workers. When there is a new order, you will laboriously recruit workers again, and work for 5 minutes and 10 hours. If you If you are not motivated to close your business, you probably won't do it.
Therefore, a better strategy is to hire a group of people and raise them on the spot, process the order when there is an order, and stay idle when there is no order.
This is the origin of the thread pool.

8. From multithreading to thread pool

The concept of the thread pool is very simple. It is nothing more than creating a batch of threads and then not releasing them. Tasks are submitted to these threads for processing, so there is no need to create and destroy threads frequently, and because of the number of threads in the thread pool It is usually fixed and does not consume too much memory, so the idea here is to reuse and controllable.

9. How does the thread pool work?

Some students may ask, how to submit tasks to the thread pool? How are these tasks given to threads in the thread pool?
Obviously, the queue in the data structure is naturally suitable for this scenario. The producer submits the task, and the thread that consumes the task is the consumer. In fact, this is the classic producer-consumer problem.

Insert picture description here

Now you should know why this question is asked in operating system courses and interviews, because if you don't understand the producer-consumer problem, you can't write the thread pool correctly.
Due to space limitations, I do not intend to explain the producer and consumer issues in detail here. You can get the answer by referring to the relevant information on the operating system. Here I intend to talk about what tasks are generally submitted to the thread pool.
Generally speaking, the task submitted to the thread pool consists of two parts:

  1. The data that needs to be processed;
  2. Functions for processing data.

Pseudo-code description:

struct task {
    
    
 void* data;     // 任务所携带的数据
 handler handle; // 处理数据的方法
}

(Note: You can also understand the struct in the code as a class, which is an object.)
Threads in the thread pool will be blocked on the queue. When the producer writes data to the queue, a thread in the thread pool will be Wake up, the thread takes the above-mentioned structure (or object) from the queue, takes the data in the structure (or object) as a parameter and calls the processing function.
The pseudo code is as follows:

while(true) {
    
    
 struct task = GetFromQueue(); // 从队列中取出数据
 task->handle(task->data);     // 处理数据
} 

The above is the core part of the thread pool.
Understand these you can understand how the thread pool works.

10. The number of threads in the thread pool

Now that there is a thread pool, what is the number of threads in the thread pool?
Think about this for yourself before going further. If you can see it here it means you are not asleep yet.
It is important to know that too few threads in the thread pool will not make full use of the CPU. Too many threads will cause system performance degradation, excessive memory usage, consumption caused by thread switching, and so on. So the number of threads can neither be too many nor too few, so how much should it be?
To answer this question, you need to know what types of tasks are handled by the thread pool. Some students may say that there are two types? Long tasks and short tasks, this is from the perspective of life cycle, then there are two types from the perspective of the resources needed to process the task, this is the okay type. . . Ah no, it is CPU intensive and I/O intensive.
1) CPU-intensive: The
so-called CPU-intensive means that processing tasks do not need to rely on external I/O, such as scientific computing, matrix operations, and so on. In this case, as long as the number of threads and the number of cores are basically the same, the CPU resources can be fully utilized.

Insert picture description here

2) I/O intensive:
This type of task may not take much time for the calculation part, and most of the time is spent on disk I/O, network I/O, etc.

Insert picture description here

In this case, it is a little more complicated. You need to use performance testing tools to evaluate the time spent on I/O waiting, which is recorded as WT (wait time), and the time required for CPU calculation, which is recorded as CT ( computing time), then for an N-core system, the appropriate number of threads is approximately N * (1 + WT/CT). Assuming that the I/O waiting time is the same as the computing time, then you probably need 2N threads to fully utilize the CPU Resources, note that this is only a theoretical value, the specific settings need to be tested according to real business scenarios.
Of course, making full use of the CPU is not the only point that needs to be considered. As the number of threads increases, memory usage, system scheduling, number of open files, number of open sockets, and open database links all need to be considered.
Therefore, there is no omnipotent formula here, and it needs to be analyzed in detail.

11. Thread pool is not a panacea

The thread pool is only a form of use of multithreading, so the problems faced by multithreading are also unavoidable, such as deadlock problems, race condition problems, etc. For this part, you can also refer to the relevant information of the operating system to get the answer. So the foundation is very important, old folks.

12. Best practices for thread pool usage

The thread pool is a powerful weapon in the hands of programmers, and thread pools can be seen on almost every server of Internet companies.
But before using the thread pool, you need to consider:
1) Fully understand your task, whether it is a long task or a short task, whether it is CPU-intensive or I/O-intensive. If both are available, then a better way is Put these two types of tasks in different thread pools, so you may be able to better determine the number of threads;
2) If the task in the thread pool has I/O operations, you must set a timeout for this task, otherwise the task will be processed The thread may be blocked forever;
3) It is best not to wait for the results of other tasks in the thread pool.

13. Summary of this article

In this article, we started from the CPU all the way to the commonly used thread pool, from the bottom to the upper, from the hardware to the software.
Note: There is no specific programming language throughout this article. Threads are not a language-level concept (user mode threads are still not considered), but when you really understand threads, I believe you can use multiple threads in any language. What you need to understand is Tao, and then it is art.
I hope this article will help you understand threads and thread pools.

Guess you like

Origin blog.csdn.net/lingshengxueyuan/article/details/111560739