A detailed explanation of threads and thread pools in high concurrency

Everything starts with the CPU

You may have questions, why start with the CPU when talking about multithreading? The reason is simple. There are no fashionable concepts here, and you can see the essence of the problem more clearly . The CPU does not know concepts such as threads and processes. The CPU only knows two things: 1. Fetch the instruction from memory 2. Execute the instruction and go back to 1

You see, here the CPU really doesn't know the concepts of processes and threads. The next question is where does the CPU fetch instructions from? The answer comes from a register called Program Counter (referred to as PC), which is the well-known program counter. Here, don’t think about the register too mysteriously. You can simply understand the register as memory, but the access speed Just faster. What is stored in the PC register? What is stored here is the address of the instruction in memory, what instruction? is the next instruction the CPU will execute.

So who is going to set the instruction address in the PC register? It turns out that the address in the PC register is automatically incremented by 1 by default, which of course makes sense, because in most cases, the CPU executes one by one sequentially. When encountering if and else, this sequential execution is broken. , when the CPU executes this type of instruction, it will dynamically change the value in the PC register according to the calculation result, so that the CPU can correctly jump to the instruction that needs to be executed. Smart you will definitely ask, how is the initial value in the PC set? Before answering this question, we need to know where the instructions executed by the CPU come from? It comes from the memory, nonsense, the instructions in the memory are loaded from the executable program saved in the disk, the executable program in the disk is generated by the compiler, and where does the compiler generate the machine instructions? The answer is the function we defined .

Note that it is a function. After the function is compiled, it will form the instruction executed by the CPU . So naturally, how do we make the CPU execute a function? Obviously, we only need to find the first instruction formed after the function is compiled, and the first instruction is the function entry. Now you should know that we want the CPU to execute a function, so we only need to write the address of the first machine instruction corresponding to the function into the PC register , and the function we wrote will start to be executed by the CPU. . You may have questions, what does this have to do with threads?

From CPU to OS

In the previous section, we understood the working principle of the CPU. If we want the CPU to execute a certain function, we only need to load the first machine execution corresponding to the function into the PC register, so that we can let the CPU execute the function even without an operating system . CPU execution program , although feasible, but this is a very cumbersome process, we need:

Find a suitable sized region loader in memory
Find the function entry, set the PC register and let the CPU start executing the program

These two steps are by no means so easy. If the programmer manually implements the above two processes every time the program is executed, he will go crazy, so smart programmers will want to simply write a program to automatically complete the above two steps. a step.

Machine instructions need to be loaded into the memory for execution, so the starting address and length of the memory need to be recorded; at the same time, the entry address of the function must be found and written to the PC register. Think about whether a data structure is needed to record this information :

struct *** {
   void* start_addr;
   int len;
   
   void* start_point;
   ...
};

Then it's naming time. This data structure must always have a name. What information is this structure used to record? What is recorded is the running state of the program when it is loaded into the memory. What is the name of the program when it is loaded from the disk to the memory? Just call it Process . Our guiding principle is that it must sound mysterious. In short, it is not easy for everyone to understand. I call it "the principle of not understanding ". And so the process was born. The first function executed by the CPU also has a name. The first function to be executed sounds more important, so let’s just call it the main function . The program that completes the above two steps should also be given a name. According to the "do not understand the principle" this "simple" program is called the operating system (Operating System). In this way, the operating system was born, and programmers no longer need to manually load it again if they want to run the program. Now that the process and the OS are in place, everything looks perfect.

From single-core to multi-core, how to make full use of multi-core

One of the characteristics of human beings is that life is constantly tossing, from single-core to multi-core.

At this time, what should we do if we want to write a program and use multiple cores? Some students may say that there is no process, and it is enough to open a few more processes? It sounds reasonable, but there are several problems:

Processes need to occupy memory space (see this from the previous energy saving), if multiple processes are based on the same executable program, then the contents of the memory areas of these processes are almost identical, which will obviously cause a waste of memory
The tasks processed by the computer may be more complicated, which involves inter-process communication. Since each process is in a different memory address space, the inter-process communication naturally needs the help of the operating system, which increases the difficulty of programming and increases the complexity of the system. overhead

what can we do about it?

from process to thread

Let me think about this problem carefully again. The so-called process is nothing more than a section of memory, which stores the machine instructions executed by the CPU and the stack information when the function is running . To make the process run, put the main function The address of the first machine instruction is written to the PC register, and the process starts running.

The disadvantage of a process is that there is only one entry function, that is, the main function, so the machine instructions in the process can only be executed by one CPU , so is there a way for multiple CPUs to execute the machine instructions in the same process? Smart you should be able to think, since we can write the address of the first instruction of the main function into the PC register, what is the difference between other functions and the main function? The answer is that there is no difference. The special feature of the main function is that it is the first function executed by the CPU. There is nothing special about it. We can point the PC register to the main function, and we can point the PC register to any function . When we point the PC register to a non-main function, a thread is born .

So far, we have liberated our minds. There can be multiple entry functions in a process, which means that machine instructions belonging to the same process can be executed by multiple CPUs at the same time . Note that this is a different concept from a process. When creating a process, we need to find a suitable area in memory to load the process, and then point the CPU's PC register to the main function, which means that there is only one execution flow in the process .

But now it is different, multiple CPUs can execute multiple entry functions belonging to the process at the same time under the same roof (the memory area occupied by the process), that is to say, there can be multiple execution flows in a process now .

It seems a bit too easy to understand to always call it the execution flow, and once again sacrificed the "do not understand the principle", and gave it a name that is not easy to understand, let's call it a thread. This is where threads come in. The operating system maintains a bunch of information for each process, which is used to record the memory space of the process, etc., and this bunch of information is recorded as data set A. Similarly, the operating system also needs to maintain a bunch of information for the thread, which is used to record the entry function or stack information of the thread, and this pile of data is recorded as data set B. Obviously, the amount of data set B is smaller than that of data A. At the same time, unlike a process, there is no need to find a memory space in the memory when creating a thread, because the thread runs in the address space of the process in which it is located . This address space is in the program It has been created at startup, and the thread is created during the running of the program (after the process starts), so when the thread starts running, this address space already exists, and the thread can be used directly. This is why the creation of threads mentioned in various textbooks is faster than the creation of processes (of course there are other reasons). It is worth noting that with the concept of threads, we only need to create multiple threads after the process is started to keep all CPUs busy. This is the root of the so-called high performance and high concurrency .

It's very simple, you only need to create an appropriate number of threads. Another point worth noting is that since each thread shares the memory address space of the process, the communication between threads does not need to rely on the operating system, which brings great convenience to programmers and endless troubles. Most of the problems I see come from the fact that inter-thread communication is simply too convenient to be very error-prone. The root of the error is that the CPU does not have the concept of threads when executing instructions. The mutual exclusion and synchronization problems faced by multi-threaded programming need to be solved by the programmers themselves. There are detailed explanations. The last thing to be reminded is that although multiple CPUs are used in the previous figure explaining the use of threads, it does not mean that multiple cores must be used to use multiple threads. In the case of a single core, multiple threads can also be created. The reason is that the thread It is implemented at the operating system level, and has nothing to do with how many cores there are . When the CPU executes machine instructions, it is not aware of which thread the executed machine instructions belong to. Even when there is only one CPU, the operating system can make each thread advance "simultaneously" through thread scheduling. "Run, but in fact there is still only one thread running at any time.

Information through train: Linux kernel source code technology learning route + video tutorial kernel source code

Learning through train: Linux kernel source code memory tuning file system process management device driver/network protocol stack

Threads and memory

In the previous discussion, we know the relationship between thread and CPU, that is, point the PC register of the CPU to the entry function of the thread, so that the thread can run, which is why we must specify an entry function when creating a thread. Regardless of the programming language used, creating a thread is largely the same:

// 设置线程入口函数DoSomething
thread = CreateThread(DoSomething);

// 让线程运行起来
thread.Run();

So what is the relationship between threads and memory? We know that the data generated when a function is executed includes information such as function parameters , local variables , and return addresses . These information are stored in the stack. When the concept of thread has not yet appeared, there is only one execution flow in the process, so there is only one stack. , the bottom of the stack is the entry function of the process, that is, the main function. Suppose the main function calls funA, and funcA calls funcB, as shown in the figure:

So what about after having a thread? After having a thread, there are multiple execution entries in a process, that is, there are multiple execution flows at the same time, then a process with only one execution flow needs a stack to save runtime information, so obviously when there are multiple execution flows, you need to have Multiple stacks are used to save the information of each execution flow, which means that the operating system must allocate a stack for each thread in the address space of the process , that is, each thread has its own stack. It is extremely important to be aware of this. Pivotal.

At the same time, we can also see that creating threads consumes process memory space, which is also worth noting.

use of threads

Now that we have the concept of threads, how should we use threads as programmers next? From the perspective of life cycle, there are two types of tasks to be processed by threads: long tasks and short tasks. 1. Long tasks, long-lived tasks, as the name suggests, means that the task survives for a long time. For example, taking our commonly used word as an example, the text we edit in word needs to be saved on the disk, and writing data to the disk is a task. Then a better method at this time is to create a thread for writing to the disk. The life cycle of the writing thread is the same as that of the word process. As long as the word is opened, the writing thread will be created. When the user closes the word, the thread will Destroyed, this is the long task.

This scenario is very suitable for creating dedicated threads to handle some specific tasks, which is relatively simple. There are long tasks and correspondingly short tasks.

2. Short tasks, the concept of short-lived tasks is also very simple, that is, the processing time of tasks is very short, such as a network request, a database query, etc., such tasks can be quickly processed and completed in a short time. Therefore, short tasks are more common in various servers, such as web server, database server, file server, mail server, etc. This is also the most common scenario for students in the Internet industry. This scenario is what we want to focus on. This scenario has two characteristics: one is that the time required for task processing is short ; the other is that the number of tasks is huge . What if you were tasked with this type of task? You may think, this is very simple, when the server receives a request, it creates a thread to process the task, and destroys the thread after the processing is completed, so easy. This method is usually called thread-per-request, which means that a thread is created for a request:

If it is a long task, then this method can work very well, but for a large number of short tasks, although this method is simple to implement, it has several disadvantages: 1. As we can see from the previous sections, threads are part of the operating system. concept (the implementation of user-mode threads, coroutines, etc. are not discussed here), so the creation of threads naturally needs to be completed with the help of the operating system, and the creation and destruction of threads by the operating system takes time. 2. Each thread needs to have its own independent stack. Therefore, when a large number of threads are created, too much memory and other system resources will be consumed. This is like you are a factory owner (think about it, are you happy?) and you have a lot of orders in your hand. Every time a batch of orders comes, you need to recruit a batch of workers. , the products produced are very simple, and the workers can finish processing them quickly. After processing this batch of orders, these workers who have worked so hard to recruit will be dismissed. When there are new orders, you will work hard to recruit them Workers once, work for 5 minutes and recruit people for 10 hours. If you are not motivated to let the company go bankrupt, you probably won’t do it. Therefore, a better strategy is to recruit a group of people and raise them on the spot. There are orders Orders are processed when there is no order, and everyone can stay idle when there is no order.

This is the origin of the thread pool.

From multithreading to thread pool

The concept of the thread pool is very simple. It is nothing more than creating a batch of threads, and then no longer releasing them. Tasks are submitted to these threads for processing, so there is no need to frequently create and destroy threads. At the same time, due to the number of threads in the thread pool It is usually fixed and does not consume too much memory, so the idea here is to reuse and controllable .

How the thread pool works

Some students may ask, how to submit tasks to the thread pool? How are these tasks given to the threads in the thread pool? Obviously, the queue in the data structure is naturally suitable for this scenario. The producer who submits the task is the producer, and the thread that consumes the task is the consumer. In fact, this is the classic producer-consumer problem .

Now you should know why operating system courses are taught and interviews are asked this question, because if you don't understand the producer-consumer problem, essentially you cannot write the thread pool correctly. Due to space limitations, the blogger here does not intend to explain the producer-consumer problem in detail, and the answer can be obtained by referring to the relevant information of the operating system. Here the blogger intends to talk about what the tasks generally submitted to the thread pool look like. Generally speaking, the task submitted to the thread pool consists of two parts: 1) the data to be processed ; 2) the function to process the data

struct task {
void* data;     // 任务所携带的数据
    handler handle; // 处理数据的方法
}

(Note, you can also understand the struct in the code as a class, that is, an object.) The threads in the thread pool will block on the queue. When the producer writes data to the queue, a thread in the thread pool will After being awakened, the thread takes out the above structure (or object) from the queue, takes the data in the structure (or object) as a parameter and calls the processing function:

while(true) {  struct task = GetFromQueue(); // 从队列中取出数据  task->handle(task->data);     // 处理数据}

The above is the core part of the thread pool. Understand these and you can understand how the thread pool works.

The number of threads in the thread pool

Now that there is a thread pool, what is the number of threads in the thread pool? Think about this for yourself before proceeding. If you can see this you are not asleep yet. You must know that too few threads in the thread pool cannot make full use of the CPU, and too many threads created will cause system performance degradation, excessive memory usage, consumption caused by thread switching, and so on. Therefore, the number of threads can neither be too many nor too few, so how much should it be? To answer this question, you need to know what types of tasks the thread pool handles. Some students may say that you said there are two types? Long tasks and short tasks, this is from the perspective of the life cycle, so there are also two types from the perspective of the resources required to process the tasks, which are nothing to look for and abstract. . Ah no, it's CPU intensive and I/O intensive. 1. CPU-intensive The so-called CPU-intensive means that processing tasks do not need to rely on external I/O, such as scientific computing, matrix operations, and so on. In this case, as long as the number of threads is basically the same as the number of cores, CPU resources can be fully utilized.

2. I/O-intensive tasks may not take much time for the calculation part, and most of the time is spent on disk I/O, network I/O, etc.

In this case, it is a little more complicated. You need to use performance testing tools to evaluate the time spent on I/O waiting, here is recorded as WT (wait time), and the time required for CPU calculation, here is recorded as CT ( computing time), then for an N-core system, the appropriate number of threads is probably N * (1 + WT/CT), assuming that the I/O waiting time is the same as the computing time, then you need about 2N threads to fully utilize the CPU Resources, note that this is only a theoretical value, and the specific setting needs to be tested according to real business scenarios. Of course, making full use of the CPU is not the only point that needs to be considered. As the number of threads increases, memory usage, system scheduling, the number of open files, the number of open sockets, and open database links all need to be considered. Therefore, there is no universal formula here, and it needs to be analyzed on a case-by-case basis .

The thread pool is not a panacea

The thread pool is only a form of multi-threading, so the problems faced by multi-threading are also unavoidable, such as deadlock problems, race condition problems, etc. For this part, you can also refer to the relevant information of the operating system to get the answer. So the foundation is very important, old irons.

Best Practices for Thread Pool Usage

The thread pool is a powerful weapon in the hands of programmers. The thread pool can be seen on almost all servers of Internet companies. Before using the thread pool, you need to consider:

Fully understand your task, whether it is a long task or a short task, whether it is CPU-intensive or I/O-intensive. If you have both, then a possible better way is to put these two types of tasks into different thread pools , this may be a better way to determine the number of threads
If the task in the thread pool has I/O operations, be sure to set a timeout for this task, otherwise the thread processing the task may be blocked forever
It is best not to wait for the results of other tasks synchronously in the thread pool

Summarize

In this section, we start from the CPU and come all the way to the commonly used thread pool, from the bottom layer to the upper layer, from hardware to software. Note that there is no specific programming language in this article. Threads are not a language-level concept (user-mode threads are still not considered), but when you really understand threads, I believe you can use many threads in any language. What you need to understand is Tao, and then there is art. I hope this article will help you understand threads and thread pools.

Original author: Code Farmer's Deserted Island Survival