Linux multithreading

thread
Linux process vs thread
Linux thread control

thread

A thread is an execution branch of a process, an execution flow that runs inside the process (the thread essentially runs in the address space of the process).

The principle of Linux threads: If we create a "process" today, do not independently create address spaces, user-level page tables, or even do IO to load program data and code into memory, we only create task_struct, and then let the new PCB point to and The old PCB points to the same mm_struct. Then, through reasonable resource allocation (the resources of the current process), each task_struct can use a part of the resources of the process. At this time, when each of our PCBs is scheduled by the CPU, is the "granularity" of execution smaller than the "granularity" of the original process execution? (thread)

What is a process? From the perspective of the OS system: the basic unit responsible for allocating system resources. After a process is created, there may be multiple execution streams (threads) in the subsequent process.

How do we view the process we have been learning and using? The essence is: the basic entity responsible for system resources, but there is only one execution flow inside.

The essence of a process is the basic entity responsible for allocating system resources.
A thread is the basic unit of OS scheduling.

【Summarize】

A line of execution in a program is called a thread. A more precise definition is: a thread is "a sequence of control within a process".
All processes have at least one thread of execution.
Threads run inside a process, essentially running in the process address space.
In the Linux system, in the eyes of the CPU, the PCB seen is more lightweight than the traditional process.
Through the process virtual address space, most of the resources of the process can be seen, and the process resources are reasonably allocated to each execution flow to form the thread execution flow.

Advantages of threads

Creating a new thread is much less expensive than creating a new process
Switching between threads requires much less work from the operating system than switching between processes
Threads consume much less resources than processes
Can take full advantage of the number of parallelizable multiprocessors
While waiting for the slow I/O operation to finish, the program can perform other computational tasks
Computationally intensive applications, in order to run on multi-processor systems, the calculation is divided into multiple threads to achieve
I/O-intensive applications overlap I/O operations to improve performance. Threads can wait for different I/O operations at the same time.

Disadvantages of threads

performance loss

A computationally intensive thread that is rarely blocked by external events often cannot share the same processor with other threads. If the number of compute-intensive threads is greater than the number of available processors, there may be a large performance penalty, where the performance penalty refers to the addition of additional synchronization and scheduling overhead, while the available resources remain the same.

reduced robustness

Writing multi-threading requires more comprehensive and in-depth consideration. In a multi-threaded program, there is a high possibility of adverse effects caused by subtle deviations in time allocation or by sharing variables that should not be shared. In other words, threads There is a lack of protection in between.

lack of access control

Process is the basic granularity of access control, and calling certain OS functions in one thread will affect the entire process.

Difficulty in programming

Writing and debugging a multithreaded program is much more difficult than a single-threaded program

thread exception

If a single thread divides by zero, the wild pointer problem will cause the thread to crash, and the process will also crash with it.
The thread is the execution branch of the process. If the thread is abnormal, it is similar to the abnormality of the process, and then triggers the signal mechanism to terminate the process. The process terminates, and all threads in the process exit immediately.

thread usage

Reasonable use of multithreading can improve the execution efficiency of CPU-intensive programs
Reasonable use of multithreading can improve the user experience of IO-intensive programs (for example, in life, we download development tools while writing code, which is a manifestation of multithreading)

Linux process vs thread

Process is the basic unit of resource allocation
A thread is the basic unit of scheduling
Threads share process data, but also have some of their own:

Thread ID (LWP)
A set of registers (context data)
stack
errno
Signal mask word
scheduling priority

Multiple threads of a process share the same address space, so Text Segment and Data Segment are shared. If a function is defined, it can be called in each thread. If a global variable is defined, it can be accessed in each thread, except In addition, threads share the following process resources and environments:

file descriptor table
How each signal is handled (SIG_IGN, SIG_DFL or a custom signal handler)
current working directory
user id and group id

Linux thread control

POSIX thread library

Thread-related functions form a complete series, most of which start with " pthread_"
To use these function libraries, by importing the header file <pthread.h>
Use the "-lpthread" option of the compiler command when linking these thread function libraries

create thread

insert image description here

【Error check】:

Some traditional functions return 0 on success, -1 on failure, and assign a value to the global variable errno to indicate an error.
The pthreads function does not set the global variable errno on error (as most other POSIX functions do). Instead return the error code via the return value
pthreads also provides an in-thread errno variable to support other code that uses errno. For the errors of pthreads functions, it is recommended to judge by the return value, because reading the return value is less expensive than reading the errno variable in the thread

Code example:

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <pthread.h>
void *rout(void *arg) {
    
    
	int i;
	for( ; ; ) {
    
    
		printf("I'am thread 1\n");
		sleep(1);
	}
} 
int main( void )
{
    
    
	pthread_t tid;
	int ret;
	if ( (ret=pthread_create(&tid, NULL, rout, NULL)) != 0 ) {
    
    
		fprintf(stderr, "pthread_create : %s\n", strerror(ret));
		exit(EXIT_FAILURE);
	} 
	int i;
	for(; ; ) {
    
    
		printf("I'am main thread\n");
		sleep(1);
	}
}

thread terminated

If you need to terminate only a certain thread without terminating the entire process, there are three ways:

return from the thread function. This method is not applicable to the main thread, and returning from the main function is equivalent to calling exit.
A thread can call pthread_exit to terminate itself.
A thread can call pthread_cancel to terminate another thread in the same process.

It should be noted that the memory unit pointed to by the pointer returned by pthread_exit or return must be global or allocated with malloc, and cannot be allocated on the stack of the thread function, because the thread function has already exited when other threads get the returned pointer.

thread waiting

The thread that has exited, its space has not been released, is still in the address space of the process.
Creating a new thread does not reuse the address space of the thread that just exited.

The thread calling this function will suspend waiting until the thread with id thread terminates. The thread thread is terminated in different ways, and the termination status obtained by pthread_join is different, which is summarized as follows:

If the thread thread returns by return, the unit pointed to by value_ptr stores the return value of the thread thread function.
If the thread thread is abnormally terminated by another thread calling pthread_cancel, the constant PTHREAD_CANCELED is stored in the unit pointed to by value_ptr.
If the thread thread is terminated by calling pthread_exit by itself, the unit pointed to by value_ptr stores the parameters passed to pthread_exit.
If you are not interested in the termination status of the thread thread, you can pass NULL to the value_ptr parameter.

separate thread

By default, the newly created thread is joinable. After the thread exits, the pthread_join operation needs to be performed on it, otherwise resources cannot be released, resulting in a system leak.
If you don't care about the return value of the thread, join is a burden. At this time, we can tell the system to automatically release thread resources when the thread exits.

The essence of separation is to prevent the main thread from joining new threads, so that resources can be automatically reclaimed when the new threads exit.
insert image description here

Thread ID and process address space layout

The pthread_create function generates a thread ID, which is stored at the address pointed to by the first parameter. The thread ID and the thread ID mentioned above are not the same thing.
The thread ID mentioned earlier belongs to the category of process scheduling. Because a thread is a lightweight process and the smallest unit of the operating system scheduler, a numerical value is required to uniquely represent the thread.
The first parameter of the pthread_create function points to a virtual memory unit, and the address of the memory unit is the thread ID of the newly created thread, which belongs to the category of the NPTL thread library. The subsequent operation of the thread library is to operate the thread according to the thread ID.
The thread library NPTL provides the pthread_self function, which can obtain the ID of the thread itself:

pthread_t pthread_self(void);

What is the type of pthread_t? Depends on implementation. For the NPTL implementation currently implemented by Linux, the thread ID of the pthread_t type is essentially an address in the address space of a process.
insert image description here
The main thread does not use the stack structure in the library, but directly uses the stack in the address space.

【Linux】Linux Multithreading