Thread synchronization and mutual exclusion [Linux]

Article directory

1. Import

Multi-thread safety refers to the ability to ensure the correctness and consistency of resources when multiple threads access shared resources at the same time. Multi-thread safety is an important concept in concurrent programming because failure to consider multi-thread safety can lead to problems such as data loss, errors, or deadlocks.

It is a good example of multi-threaded programming that multiple people grab a fixed number of tickets at the same time. In this example, each person can be regarded as a thread, and the number of tickets can be regarded as a shared resource, then it will be set as a global variable shared by all threads. Here's a simple implementation of it:

#include <iostream>
#include <unistd.h>
#include <pthread.h>

using namespace std;

int tickets = 10000; // 票数

// 线程函数
void* getTickets(void* args)
{
    
    
	(void)args;
	while(1)
	{
    
    
		if(tickets > 0)
		{
    
    
			usleep(1000);
			printf("[%p]线程:%d号\n", pthread_self(), tickets--);
		}
		else break;
	}
	return nullptr;
}
int main()
{
    
    
	pthread_t t1, t2, t3;
	// 多线程抢票
	pthread_create(&t1, nullptr, getTickets, nullptr);
	pthread_create(&t2, nullptr, getTickets, nullptr);
	pthread_create(&t3, nullptr, getTickets, nullptr);

    pthread_join(t1, nullptr);
    pthread_join(t2, nullptr);
    pthread_join(t3, nullptr);
    return 0;
}

output:
image-20230412234934509

But the result is negative. This is because of a race condition in the code. In a multi-threaded environment, race conditions may arise when multiple threads access shared data (i.e. the global tickets variable) at the same time. While one thread is checking tickets > 0, another thread may modify the value of tickets. This can cause multiple threads to enter the critical section at the same time and perform the tickets– operation, causing the value of tickets to become negative.

In fact, --although the operation is only 1 line from the C/C++ code, it is 3 instructions for the CPU (and its registers), which can be verified by assembly code. The screenshot is from https://godbolt.org /

image-20230413000923694

Therefore, the tickets–(self-increment and self-decrement) operation is not atomic, it actually consists of three steps: read the value of tickets from memory to the register of the CPU, the CPU decrements it by 1, and then writes the result back Memory – These steps are usually done with registers.

Since these steps are not atomic, race conditions may arise in a multi-threaded environment. why? (The importance of atomic operations can be understood through the following passage).

First of all, it must be clear that the scheduling of threads is uncertain, and threads may be switched at any time. That is to say, the thread --may be switched at any possible timing of any three steps of the operation.

For example, assuming that the value of tickets is 1 before thread A and thread B execute, it means that they have both passed tickets > 0this branch. The thread is switched by the scheduler before the thread reads the value of tickets from the memory; at this time, thread B is scheduled, but it is not interrupted, but completes the three steps, so the tickets in the memory The value has been updated to 0 by thread B. Then at a certain moment, thread A is scheduled back to continue execution again, and thread A continues to execute from the interrupted place (note that the if judgment has been passed at this time), and the value of tickets read from memory is 0 after being updated by thread B. Then the value of the final tickets is -1.

Therefore, the result of -1 is not certain. If the initial value of tickets is set relatively small, you may get 0 in the end (depending on the scheduler). Such a result is also wrong, because it is generally not true in real life. 0 will be used as the number of the ticket. Similarly, when thread A reads the value of tickets from memory, it is immediately switched to thread B to perform operations --, even if the value of tickets is updated to 0 by thread B, the conditions for continuing to collect tickets are no longer met; but the operating system It will save its context data when thread A is switched (the data loaded into the register is called context), and when it is switched back, thread A still sees the original tickets value 1, so thread A will still use tickets The value is updated to 0.

Since multiple threads access shared resources at the same time, mutual exclusion locks or other synchronization mechanisms are required to ensure data consistency. Access to shared data can be protected using synchronization mechanisms such as mutexes or condition variables. This article describes some of the synchronization mechanisms.

Supplement: If you use the g++ compiler to compile a C++ source file containing thread library functions under Linux, you must add -lthreadoptions, even <thread>if it is a built-in thread library in C++.

reason:

The built-in thread library of C++ is based on the pthread package, which provides a higher level of abstraction and interface, making it more convenient and safe to write multi-threaded programs. The built-in thread library of C++ includes some classes and functions, such as std::thread, std::mutex, std::condition_variable, etc., all of which encapsulate or extend the functions of pthread.

The pthread library we used on the Linux platform is a cross-platform thread standard that defines a series of functions and data types for creating and managing threads. Pthread is part of the POSIX standard, so pthread can be used in operating systems that support POSIX. In other words, if this code is compiled and run in the Windows environment, then the built-in thread library of C++ will be linked to the built-in thread library of Windows.

When compiling C++ source files containing pthread library functions with the g++ compiler, the -lthread option needs to be added because pthread is not a part of the C++ standard library, but an independent library. Therefore, at the linking stage, you need to tell the compiler to find the pthread library and link it into the executable. The -lthread option is used to specify the option to link the pthread library, which will search for a file named libthread.so or libthread.a in the system and link it into the executable file.

2. Leading concepts

In multi-threaded programming, a common problem is how to deal with the situation that multiple threads access and modify the same global variable at the same time. If the code is not written in a standardized manner, thread safety problems are prone to occur.

In order to solve the problem of thread safety, a common method is to use synchronization mechanisms, such as locks, semaphores, mutexes, etc. The synchronization mechanism can ensure that only one thread can access and modify shared data at any time, thereby avoiding data inconsistency and errors.

2.1 Synchronous and asynchronous

Before understanding the synchronization mechanism, it is necessary to clarify the concept of synchronization between threads. Synchronization and asynchrony are relative and can be understood together. Take class as an example, assuming that Xiao Ming has something to go out during class:

  • Synchronization: The whole class is suspended, and the class will not resume until Xiao Ming comes back;
  • Asynchronous: continue to attend classes, each busy with their own, independent of each other.

Synchronous and asynchronous are often used to describe the relationship between two or more events. Synchronization means that two or more events occur in a certain order, and the occurrence of one event depends on the completion of another event. Asynchrony means that there is no fixed sequence between two or more events, and they can occur independently.

2.2 Mutual exclusion and concurrency

Mutex is the opposite of concurrency. Mutual exclusion means that only one visitor can access the same resource at the same time, and other visitors need to wait for the previous visitor to finish accessing the resource before they can start accessing the resource. Concurrency means that in the operating system, there are multiple programs running on the same processor at the same time.

For example, suppose a movie is being shown in a movie theater, and the seats of this movie are resources.

  • Mutual exclusion: If the movie is sold out, no one will be able to buy tickets for the movie.
  • Concurrency: If multiple movies are shown in the theater at the same time, the audience can choose to buy tickets for other movies. This is the concept of concurrency.

Here, we first understand mutual exclusion, and through subsequent in-depth study, we can gradually understand concurrency. First of all, we can look at concurrency from the perspective of sets: all the people who watch movies in the cinema are the complete set C, those who bought the movie tickets belong to the set A, and those who did not buy the movie tickets belong to the set B, set A+set B=the complete set c. Then this "either-or" relationship is mutually exclusive (either you or me).

2.3 Atomic operations

An atomic operation is one or a series of operations that cannot be interrupted. These operations can only be executed by one thread before another thread can start to perform the operation, which means that these operations are indivisible, and threads cannot alternately perform these operations. For example, the operations in the example are not atomic, --because It requires 3 steps.

Therefore, to reduce errors similar to the above example in multi-threaded programming, atomic operations must be used. Atomic operations have only two states, before completion and after completion, and there is no intermediate state (in progress).

The assembly instruction corresponds to the operation of the CPU register hardware, so from the perspective of assembly, an operation corresponds to only one assembly instruction, then this operation is atomic. For the CPU, an atomic instruction is an operation performed directly by the CPU.

2.4 Critical Resources and Critical Sections

Critical resources and critical areas are two important concepts in the operating system, and they are closely related to process synchronization and mutual exclusion. Book

critical resource

  • Critical resources refer to resources that cannot be used or accessed by multiple processes at the same time in a multi-process environment.

For example printers, tape drives, files, etc. If multiple processes use or access critical resources simultaneously, data inconsistencies or errors may result. Therefore, for critical resources, mutually exclusive access between processes must be implemented, that is, at any time, only one process can use or access the resource, and other processes that need to use or access the resource must wait.

critical section

  • A critical section refers to the piece of code that accesses critical resources in a multi-process environment.

Because the critical section involves the operation of critical resources, it must be guaranteed that at any time, only one process can execute the code in the critical section, and other processes that need to execute the code in the critical section must wait. If multiple processes execute the code in the critical section at the same time, data inconsistency or error may result.

how to manage

In order to protect critical resources and manage critical areas, the operating system provides some mechanisms, such as semaphores, mutexes, condition variables, monitors, etc. The basic idea of ​​these mechanisms is: before entering the critical section, the process must first acquire a flag or lock, indicating that the process has access to critical resources; after exiting the critical section, the process must release the flag or lock, indicating that the process Relinquishes access to a critical resource; if a process attempts to acquire a flag or lock that is already held by another process, the process is blocked until the other process releases the flag or lock.

Through these mechanisms, effective protection and management of critical resources and critical areas can be achieved, thereby ensuring data consistency and correctness in a multi-process environment.

3. Mutex

3.1 Introduction

To undertake the ticket grabbing procedure above, judging tickets > 0the essence is also a way of calculation. Before the CPU calculates, the data in the memory must be loaded (load) into the registers of the CPU. The flow of data from the memory to the registers is only reflected in the level of data transfer. This is the difficulty of understanding the error of the result. From the perspective of execution flow, which execution flow instruction is currently being executed by the CPU, the data of that execution flow is stored in its register. When multiple threads access the same global variable (shared resource), the value of the global variable in the context data may have reached the limit, but the thread sees the value before it is modified. The result of switching under chaotic timing.

This is similar to (non-)reentrant functions. Operating variables in C/C++ --is risky when switching threads. And this is just an independent example, the actual situation is much more complicated, and the thread is scheduled (switched) is also uncertain.

How to avoid such problems?

  • Protect global variables (shared resources).

The reason for its negative number is that --the operation was interrupted. Thread A was switched before the CPU was calculated, but thread B still saw the original value. When thread B was updated, the value of the global variable was no longer legal. , but thread A is switched back, "restore the thread context, the value of the global variable in the context is the old value, passed the if judgment", so it is reduced once more.

That is to say, the root cause is that --the operation was interrupted. If there is a mechanism that prevents --other threads from performing non-atomic operations like this, it can ensure that the shared resource must be legal in the end.

How is this mechanism implemented?

In the example of grabbing tickets, a mutual exclusion mechanism can be implemented with a mark, which is unique to a shared resource, that is, before all threads access the same shared resource, the operating system will only allow that one to be marked thread access.

3.2 Concept

A mutual exclusion lock (mutex) is a tool used to implement a synchronization mechanism between multiple threads, which can ensure that only one thread can access a shared resource or code segment at any one time. Mutex locks can avoid problems such as data races or deadlocks in multi-threaded programs, and improve the correctness and stability of programs.

The basic usage of a mutex is:

  1. Create a mutex object, and then call the lock() function of the mutex before the code that needs to access the critical area to acquire the ownership of the lock.
  2. After accessing the critical section, call the unlock() function of the mutex to release the ownership of the lock.

After the thread executes the task and releases the lock, the lock will be passed to other threads waiting to acquire the lock, and they will repeat the above operation to complete the task safely.

Replenish:

The C++ standard library provides the std::mutex class to implement the mutex function, as well as two auxiliary classes, std::lock_guard and std::unique_lock, which are used to simplify the management and exception safety of the mutex. – However, in this article, we will not introduce the mutex lock in C++, but still take the lock in the pthread library as an example. As mentioned above, the built-in thread library function in C++ is also implemented through the pthread library function.

In the pthread library, mutex-related functions are provided for creating, initializing, locking, unlocking, and destroying mutexes. Mutex locks can be divided into global locks and local locks. Their usage is different, but they must beInitialize, lock and unlockoperate.

  • A global lock refers to a mutex defined in the global variable area of ​​a program, which can be used by any thread in the program. The advantage of the global lock is that it is easy to use, does not need to pass parameters, and does not need to dynamically allocate memory. The disadvantage of the global lock is that it may cause resource waste, because different threads may need to access different shared resources, but they can only use the same mutex, which will cause unnecessary waiting and blocking. In addition, the global lock is not conducive to modular programming, because it destroys the encapsulation of data.

  • A local lock refers to a mutex defined in the local variable area or heap area of ​​the program, which can only be used by threads in the function or structure in which it is defined. The advantage of local locks is that multiple mutexes can be created as needed, and each mutex only protects one shared resource, which can improve concurrency and efficiency. In addition, local locking is also beneficial to modular programming because it maintains data encapsulation. The disadvantage of local locks is that they need to pass parameters or dynamically allocate memory, which will increase the complexity and overhead of programming.

What is locking and unlocking?

Locking and unlocking is a way to achieve mutual exclusion in critical sections. Locking means that before entering the critical section, the thread needs to acquire a lock object. If the lock object is already occupied by other threads, it must wait or block until the lock object is released. Unlocking means that after exiting the critical section, the thread needs to release the lock object, so that other waiting threads have the opportunity to acquire the lock object and enter the critical section. – The most important point is thatIf a thread that has not acquired the lock is assigned to perform a task, it will block and wait

3.3 Examples

pthread_mutex function family

pthread_mutexThe function family is a set of functions in the POSIX threads library for manipulating mutexes. They include:

  • pthread_mutex_init: Initialize the mutex. It accepts two parameters, the first parameter is a pointer to a variable of pthread_mutex_ttype and the second parameter is pthread_mutexattr_ta pointer to a variable of type that is used to set the properties of the mutex. If using the default properties, you can set the second parameter to NULL.
  • pthread_mutex_destroy: Destroy the mutex. It takes a pointer to a variable of pthread_mutex_ttype as parameter. After using the mutex, this function should be called to release the resource.
  • pthread_mutex_lock: Lock the mutex. It takes a pointer to a variable of pthread_mutex_ttype as parameter. If the mutex is already locked, the thread calling this function will block until the mutex is unlocked.
  • pthread_mutex_trylock: Attempt to lock the mutex. It takes a pointer to a variable of pthread_mutex_ttype as parameter. If the mutex is already locked, the function returns immediately without blocking.
  • pthread_mutex_unlock: Unlock the mutex. It takes a pointer to a variable of pthread_mutex_ttype as parameter. After using the shared resource, this function should be called to unlock the mutex so that other threads can access the shared resource.

The above are several functions commonly used in pthread_mutexthe function family, they all accept a pthread_mutex_tpointer to a type variable as a parameter, and return 0 on success, and return an error code on failure.

usage

The mutex ( ) in pthread pthread_mutex_tis a structure type, which contains some internal variables to represent the status and attributes of the mutex. We don't need to care about the specific meaning of these variables for the time being, we only need to know that it is used to achieve mutual exclusion between threads. Proceed as follows:

To use a variable of type pthread_mutex_t, it must first be initialized. There are two ways to initialize:

  • Static initialization: assign a constant value to the mutex at compile time, indicating that it is a mutex with default attributes (we don't need to care about what the default attributes are for the time being), this method can only be used for global or static variables. For example:

    pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER // 它是一个宏
    

    This creates a mutex variable mutex with default attributes. The advantage of static initialization is that it is simple and convenient, and does not need to call functions, but the disadvantage is that only default attributes can be used, and other attributes cannot be specified, such as whether it is recursive or not, whether it is robust, etc.

  • Dynamic initialization: call a function to initialize variables at runtime, this method can be used for global, static and local variables. For example:

    pthread_mutex_t mutex;
    pthread_mutex_init(&mutex, NULL);
    

    This also creates a mutex variable mutex with default attributes.


    Note: Regarding the attribute, don't care about it for the time being, it is generally set to nullptr/NULL.

    But unlike static initialization, dynamic initialization can specify the second parameter as a variable of type pthread_mutexattr_t, which can be used to set the attributes of the mutex, for example:

    pthread_mutex_t mutex;
    pthread_mutexattr_t attr;
    pthread_mutexattr_init(&attr);
    pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
    pthread_mutex_init(&mutex, &attr);
    

This creates a recursive attribute mutex variable mutex. The recursive property means that the same thread can lock the same mutex multiple times without causing a deadlock. The advantage of dynamic initialization is that you can flexibly set the attributes of the mutex, but the disadvantage is that you need to call multiple functions, and you must pay attention to releasing the memory of the mutex and attribute variables, for example:

pthread_mutex_destroy(&mutex);
pthread_mutexattr_destroy(&attr);

In short, the variable of type pthread_mutex_t is an important thread synchronization mechanism, which can be used to protect shared resources from being modified by multiple threads at the same time. According to different needs, you can choose static initialization or dynamic initialization to create mutex variables, and pay attention to using and releasing them correctly.

global lock

  1. pthread_mutex_tDefine a global lock with ;
  2. Use locks before operating on global variables pthread_mutex_lock();
  3. Use unlock after manipulating global variables pthread_mutex_unlock().
#include <iostream>
#include <unistd.h>
#include <pthread.h>

using namespace std;

pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER; // 定义一个全局锁
int tickets = 10000; // 票数

// 线程函数
void* getTickets(void* args)
{
    
    
	(void)args;
	while(1)
	{
    
    
		pthread_mutex_lock(&mtx); // 加锁
		if(tickets > 0)
		{
    
    
			usleep(1000);
			printf("线程[%p]:%d号\n", pthread_self(), tickets--);
			pthread_mutex_unlock(&mtx); // 解锁
		}
		else
		{
    
    
			pthread_mutex_unlock(&mtx); // 解锁
			break;
		} 
	}
	return nullptr;
}
int main()
{
    
    
	pthread_t t1, t2, t3;
	// 多线程抢票
	pthread_create(&t1, nullptr, getTickets, nullptr);
	pthread_create(&t2, nullptr, getTickets, nullptr);
	pthread_create(&t3, nullptr, getTickets, nullptr);

    pthread_join(t1, nullptr);
    pthread_join(t2, nullptr);
    pthread_join(t3, nullptr);
    return 0;
}

output

image-20230413155442629

It can be seen that the final value of the global variable tickets will not be 0 or 1, and there are also 3 threads and 10,000 tickets. After using the mutex, the time will become longer. Although the screenshots are all the same thread, you can also see other threads running during the running process. The possible reason why a certain thread often appears is that this thread has a relatively high priority and will be scheduled by the scheduler first. – This is the behavior of the scheduler, not the code written by the user.

Before locking, because the scheduling of threads is uncertain, each thread's access to critical resources is independent of each other, but after locking, only one thread is allowed to access critical resources, which ensures that global variables must be legal in the end. At the same time, it will bring a certain degree of performance loss.

local lock

If it is a locally defined lock, the corresponding initialization function must be called to initialize the lock, and the lock must also be destroyed in the corresponding place.

int main()
{
    
    
	pthread_mutex_t mtx; // 定义局部锁
	pthread_mutex_init(&mtx, NULL); // 初始化锁

	pthread_t t1, t2, t3;
	// 多线程抢票
	pthread_create(&t1, nullptr, getTickets, nullptr);
	pthread_create(&t2, nullptr, getTickets, nullptr);
	pthread_create(&t3, nullptr, getTickets, nullptr);

    pthread_join(t1, nullptr);
    pthread_join(t2, nullptr);
    pthread_join(t3, nullptr);

	pthread_mutex_destory(&mtx); // 销毁锁
    return 0;
}

But in this case, the thread function getTickets()cannot see the local lock defined in the main function, but it can be realized by passing parameters. Since the parameter of the thread function is void* type, it can receive any type of actual parameter, such as an array, or even For an object, as long as the parameter is passed as (void*), and then turned back inside the function, the content in the parameter can be obtained. The difference in data types is just a different perspective on memory, which restricts the access to memory data, while the data itself remains unchanged. It's like some network disks will detect resources that are not allowed to be uploaded, but we can modify the suffix and then upload it. If we want to use it later, we can change it back. The content inside will not be changed.

The thread information (such as the thread alias) and the address of the lock can be packaged into an object and passed to the thread function. The type of this object can be defined as ThreadData:

#include <iostream>
#include <string>
#include <unistd.h>
#include <pthread.h>
#include <ctime>
#include <chrono>
using namespace std;

#define THREAD_NUM 5

int tickets = 10000; 		// 票数

class ThreadData
{
    
    
public:
	// 构造函数
	ThreadData(const string& tname, pthread_mutex_t* pmtx)
	: _tname(tname)
	, _pmtx(pmtx)
	{
    
    }
public:
	string _tname;			// 线程名
	pthread_mutex_t* _pmtx;	// 锁的地址
};

// 线程函数
void* getTickets(void* args)
{
    
    
	ThreadData* td = (ThreadData*)args; 	// 获取参数传递的数据
	while(1)
	{
    
    
		pthread_mutex_lock(td->_pmtx); 		// 加锁
		if(tickets > 0)
		{
    
    
			usleep(1000);
			printf("线程[%p]:%d号\n", pthread_self(), tickets--);
			pthread_mutex_unlock(td->_pmtx); // 解锁
		}
		else
		{
    
    
			pthread_mutex_unlock(td->_pmtx); // 解锁
			break;
		} 
		usleep(rand() % 1500);				// 抢完票的后续操作, 用sleep代替
	}
	delete td; 								// 销毁数据
	return nullptr;
}
int main()
{
    
    
	auto start = std::chrono::high_resolution_clock::now(); // 计时开始
	pthread_mutex_t mtx; 					// 定义局部锁
	pthread_mutex_init(&mtx, NULL); 		// 初始化锁
	srand((unsigned long)time(nullptr) ^ 0x3f3f3f3f ^ getpid());

	pthread_t t[THREAD_NUM];
	for(int i = 0; i < THREAD_NUM; i++)		// 多线程抢票
	{
    
    
		string tname = "thread["; 			// 线程名
		tname += to_string(i + 1); tname += "]";
		ThreadData* td = new ThreadData(tname, &mtx); 			// 创建保存数据的对象
		pthread_create(t + i, nullptr, getTickets, (void*)td); 	// 创建线程的同时将名字和数据对象传递
	}
	for(int i = 0; i < THREAD_NUM; i++)		// 等待线程
	{
    
    
   		pthread_join(t[i], nullptr);
	}

	pthread_mutex_destroy(&mtx); 			// 销毁锁
	auto end = std::chrono::high_resolution_clock::now();    // 计时结束
	cout << "THREAD_NUM = " << THREAD_NUM << endl;
	cout << "共花费: " << chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << endl; 
    return 0;
}

Added logic:

  1. The members of the ThreadData class contain thread information (only the thread name is used for convenience, in fact the thread has other information), and the address of the local lock defined in the main function to be used by the thread function getTickets();
  2. In the thread function getTickets() function, the thread needs to do other work after locking, accessing critical resources, and unlocking, such as processing data, etc. Here, a random number of usleep is used instead. The random number seed in main is XORed with a few numbers (arbitrarily taken), which is intended to make the random number more random;
  3. Create a thread in a loop, bind the name and number of the thread, and pack the address of the lock into the ThreadData object. Note that this object is newout, so deleteit is required at the end of the thread function getTickets(); inside the function, you need to use the object to extract its member variables to use thread information and locks;
  4. In order to replace the performance analysis with time later, timing logic is added at the beginning and end of the main function, and the class in the <chrono>header high_resolution_clockto achieve high-precision timing (milliseconds), and there is no need to care about its use here .

output:

image-20230413174300895

3.4 Performance Loss

In the above example, if THREAD_NUM is changed to 100, will the running time be shortened (the usleep in the thread function is commented out, and the print statement is added)?

image-20230413191624103

From the results, even if the thread is not allowed to sleep, it is not much faster, it is in milliseconds.

Increasing the number of threads may shorten the execution time of the program, but this is not absolute. The execution time of a program depends on many factors, including hardware performance, operating system scheduling policy, program structure and algorithm complexity, etc. In a multi-core processor system, increasing the number of threads can make full use of the parallel computing capability of the multi-core processor, thereby shortening the execution time of the program. However, if the number of threads is too large, the scheduling and synchronization overhead between threads will also increase, which will affect the execution efficiency of the program (that is to say, thread scheduling also takes time).

Also, if your program has a lot of serial computation or I/O operations, increasing the number of threads may not significantly improve your program's execution time.

Although mutexes can protect the security of shared resources, they also bring some performance overhead, mainly in the following aspects:

  • The creation and destruction of mutexes need to call the API of the operating system, which consumes a certain amount of time and memory resources.
  • The locking and unlocking of the mutex requires atomic operations, which will increase the number of CPU instructions and memory access times.
  • The waiting and waking up of the mutex requires a context switch (context switch), which will cause the invalidation of the CPU cache (cache) and the delay of thread scheduling (scheduling).
  • Mutex competition will cause thread blocking (blocking) or busy waiting (busy waiting), which will reduce thread utilization and concurrency.

Therefore, mutexes will reduce the efficiency of multithreaded programs to a certain extent, especially in code segments or resources protected by mutexes:

  • Very frequently accessed, resulting in intense lock competition.
  • Very time-consuming to execute, resulting in long lock holding times.
  • It is handled very simply, resulting in a high percentage of locking overhead.

So, how to reduce the impact of mutexes on the efficiency of multithreaded programs? In general, there are several suggestions:

  • Minimize the number and scope of mutexes, only protect necessary shared data or critical sections, and avoid oversynchronization.
  • Try to shorten the holding time of the mutex, release the lock as soon as possible, and avoid I/O operations or other time-consuming operations while holding the lock.
  • Try to use a more efficient synchronization mechanism, such as read-write lock, spin lock, condition variable, etc., and choose the appropriate tool according to different scenarios.

In short, mutex is a synchronization mechanism with advantages and disadvantages. It can ensure the correctness and stability of multi-threaded programs, but it will also reduce the efficiency of programs. Therefore, when using mutexes, you need to weigh the pros and cons, and design and optimize the code reasonably to achieve the best performance.

3.5 Serial Execution

In a multithreaded program, if multiple threads need to access shared resources, it is usually necessary to use a synchronization mechanism (such as a mutex) to protect the shared resources. When a thread acquires the lock and enters the critical section, other threads trying to enter the critical section will be blocked until the lock is released. In this way, multiple threads will execute serially (xin, 2) in the critical section.

Serial execution can be used to describe the execution order of statements in a single thread, or it can be used to describe the execution order among multiple threads. In a program, it means that the instructions are executed sequentially, and the execution of each instruction must be executed after the previous instruction is completed. Here is a simple C++ program that demonstrates serial execution:

#include <iostream>
int main() 
{
    std::cout << "Step 1" << std::endl;
    std::cout << "Step 2" << std::endl;
    std::cout << "Step 3" << std::endl;

    return 0;
}

In this program, three std::coutstatements are executed sequentially. The output of the program is as follows:

Step 1
Step 2
Step 3

It can be seen that the instructions in the program are executed sequentially, which is serial execution. The object of this code is each statement, and the object of serial execution can also be a thread.

Serial execution is a method to achieve multi-thread safety, which refers to allowing multiple threads to execute sequentially in a certain order, rather than at the same time. Serial execution can avoid the competition of multiple threads for the same resource, thus ensuring the integrity and correctness of the resource. To put it bluntly, let the threads queue up to execute tasks one by one., the efficiency is naturally not as good as multiple threads executing at the same time.

The advantage of serial execution is that it is simple and easy to understand, does not require additional synchronization mechanisms, and does not cause problems such as deadlocks. The disadvantage of serial execution is that it is inefficient, cannot fully utilize the performance of multi-core processors, and cannot achieve true parallelism. The deadlock-related content is in Section VI of this article.

Serial execution can be achieved in several ways:

  • Use a single thread: If only one thread executes all the tasks, then there is no problem of multi-thread safety, that is, serial execution. This is the easiest way, but also the least efficient.
  • Use a mutex: A mutex is a synchronization mechanism that ensures that only one thread can access a shared resource at any time. Other threads that want to access the resource must wait for the lock to be released before proceeding. This approach can achieve partial parallelism, but it also increases overhead and complexity.
  • Using Queues: A queue is a data structure that stores and processes data on a first-in-first-out (FIFO) basis. If all tasks that need to access shared resources are put into a queue, and then a dedicated thread executes these tasks sequentially in the order in the queue, then serial execution can be achieved. This approach reduces lock usage, but also increases latency and memory consumption.

In short, in multi-thread safety, serial execution is a simple but inefficient method, which is suitable for scenarios with low performance requirements and high correctness requirements.

Is locking a serial execution?

Locking causes multiple threads to execute serially within the critical section. But this does not mean that the entire program is executed serially. Outside a critical section, multiple threads can still execute in parallel. Locking is just a synchronization mechanism, it does not change the parallel nature of the program. It just ensures that multiple threads do not conflict when accessing shared resources.

3.6 Supplement

After locking, will the thread be switched when executing the code in the critical section?

The answer is yes. After locking, the thread may be switched in the critical section, which is determined by the operating system scheduling mechanism. Locking can only ensure that the thread will not be switched before entering the critical section, but during the execution process in the critical section, the thread may still be switched due to various reasons, such as the time slice is exhausted, an interruption occurs, and the CPU is actively given up wait. When a thread is switched, it still holds the lock object, and it will not release the lock object until it is scheduled again and executes the critical section code.

Re-examining the above code, when a thread holding a lock is switched, other threads cannot apply for the lock, and all other threads cannot execute the code in the critical section, which ensures the data consistency of critical resources. (Brother is not in the Jianghu, but there are still my legends in the Jianghu~) In fact, there is no security impact, it just makes other threads wait for a while, which reduces the efficiency [it is also quite bad, there are ways to alleviate it].

So, does it have any effect if the thread is switched in the critical section? (Note that "the thread is in the critical section" is equivalent to "the thread is executing the code in the critical section")

There are two main impacts:

On the one hand, the thread being switched in the critical section will cause other waiting threads to fail to enter the critical section in time,That is to say, when the thread is switched, it runs with the lock, but there is only one lock for a certain critical resource. Thereby reducing the concurrent performance and response speed of the program. Therefore, when designing a critical section, the length and complexity of the critical section should be minimized, and time-consuming operations or calls to functions that may be blocked should be avoided in the critical section.

On the other hand, the thread being switched in the critical section may also cause some logic errors or deadlock situations. For example, if a thread tries to acquire another lock object in the critical section after acquiring a lock object, and this lock object happens to be occupied by another thread, and this thread is waiting for the lock object released by the first thread , then a deadlock of circular waiting will be formed. Therefore, when designing critical sections, some norms and principles should be followed, such as avoiding nested use of multiple lock objects, acquiring and releasing lock objects in a fixed order, using timeout mechanism or deadlock detection mechanism, etc. In fact, there is an important rule that locks can be used without locks, because it is very troublesome to check for errors.

Will there be a problem if the critical section has many statements?

Although there are many codes in the critical section, the mutex ensures that only one thread can access the code in the critical section at the same time, and there will be no problem if the code itself meets the requirements.

It depends on whether the code in the critical section satisfies the following principles:

  • Atomicity: The code in the critical section should be indivisible, that is, either all of it is executed, or none of it is executed. If the code in the critical section may throw an exception or be interrupted, then you need to use an exception handling or signal processing mechanism to ensure that the code in the critical section can exit correctly and release the lock under any circumstances.
  • Mutual exclusion: The code in the critical section should only be executed by one thread, that is, no other thread can enter the critical section at the same time. This requires the use of synchronization mechanisms, such as mutexes, semaphores, condition variables, etc., to ensure that only one thread can gain access to the shared resource.
  • Orderliness: The code in the critical section should be executed in the expected order, that is, there should be no instruction rearrangement or memory visibility problems. This requires the use of memory barriers or atomic operations to ensure that the code in the critical section can be executed correctly under different processors or memory models.

What is the correct way of multi-threaded coding?

We can't control the scheduler's strategy for scheduling threads. We can only artificially restrict access to shared resources in the same time period by locking and unlocking. This operation must be implemented manually by the programmer. Shared resources are exposed to all threads in the same process address space, and they can directly access shared resources. Locking only takes advantage of the fact that the order of statement execution is from top to bottom. If you apply for a lock in or after the critical section, the lock is useless.

But it is a wrong multi-threaded programming method for a thread to access critical resources without applying for a lock. The specific reasons have been explained more than once above. To prevent this from happening, a synchronization mechanism such as a mutex should be used to protect shared resources. When a thread needs to access a shared resource, it should first apply for a lock and then enter the critical section. After using the shared resource, the lock should be released so that other threads can access the shared resource.

To access critical resources, each thread must apply for a lock, which means that the lock must be shared by all threads, so the lock itself is also a shared resource. The lock guarantees the security of critical resources, so who will guarantee the security of the lock itself?

The security of the lock itself is guaranteed by the operating system's atomic operations. Atomic operations can ensure that only one thread can access the lock at any time in a multi-threaded environment, and the operations of applying for and releasing the lock are also atomic, thus ensuring the security of the lock itself.

4. Implementation principle of mutual exclusion lock

The realization principle of the mutex can be divided into two aspects: the hardware level and the software level. Here, only some operations of the CPU and registers are discussed, taking the software level as an example.

The essence of a mutex is a marking function, which is a number in memory.

4.1 Thread execution and blocking

A thread without a lock will hang waiting for the allocation of the lock after being assigned a task.

The implementation principle at the software level mainly depends on the scheduling mechanism provided by the operating system, that is, the operating system can control the execution and blocking of threads or processes. The operating system can maintain a mutex state and a waiting queue. When a thread or process wants to access a shared resource, it first checks the state of the mutex. If the mutex is not occupied, it can continue to access and set the mutex The state of the lock is set to occupied; if the mutex is already occupied, you need to add yourself to the waiting queue and block yourself; after accessing the shared resource, set the state of the mutex to unoccupied and wake up the wait A thread or process in the queue. This kind of mutex is also called a sleep lock (sleeplock), because the waiting thread or process needs to sleep and wait to be woken up.

4.2 Spin locks and mutexes

concept

In the Linux kernel, locking and unlocking are implemented using atomic instructions. Atomic instructions are operations performed directly by the CPU, and they are guaranteed to be atomic.

Mutual exclusion lock (mutex) and spin lock (spinlock) are two common synchronization mechanisms used to protect critical section access. The difference between them is that when a thread tries to acquire a lock that is already occupied, the mutex will let the thread go to sleep and wait for the lock to be released; while the spin lock will let the thread continuously check the status of the lock , until the lock is acquired. Therefore, the mutex can avoid wasting CPU resources, but it will increase the overhead of context switching; while the spin lock can reduce the overhead of context switching, but it will occupy CPU resources.

In Linux, we can use the implementation principle of spin locks to understand mutexes from an assembly point of view. In fact, mutexes in the Linux kernel are implemented based on spin locks.

Specifically, a structure is defined in the Linux kernel mutex, which contains a spin lock and a waiting queue. A spin lock prevents multiple threads from simultaneously accessing a shared resource by constantly checking the state of the lock. If the lock is occupied, the thread will wait until the lock is released. When a thread tries to acquire a mutex, it first tries to acquire the spin lock inside the mutex. If it succeeds, it means that the mutex is not occupied, then the thread can enter the critical section; if it fails, it means that the mutex is already occupied, then the thread will add itself to the waiting queue and release the spin lock, and then go to sleep. When a thread releases a mutex, it first checks to see if the wait queue is empty. If it is empty, it means that no other thread is waiting for the mutex, then the thread can directly release the spin lock; if it is not empty, it means that there are other threads waiting for the mutex, then the thread will start from the waiting queue Take a thread out of it, wake it up, and transfer the spinlock to it.

From an assembly point of view, some special instructions are used in the Linux kernel to implement spin locks and mutexes. For example, under the x86 architecture, the Linux kernel uses the lock prefix to ensure the atomicity of instructions;The xchg instruction is used to exchange the values ​​of two operands(the most important instruction in this section); the cmpxchg instruction is used to compare and exchange the values ​​of two operands; the test_and_set_bit instruction is used to test and set a bit; the test_and_clear_bit instruction is used to test and clear a bit; the pause instruction is used to optimize the spin cycle, etc.These instructions take advantage of the CPU's hardware support to implement atomic operations and memory barriers

Lock is a common synchronization mechanism used to ensure mutual exclusive access of multiple threads to shared resources. However, the implementation of lock is not simple, and some low-level atomic operations are needed, such as xchgb/xchg instructions.The xchgb/xchg instruction is an instruction that exchanges the values ​​of two operands. It is atomic, that is, it will not be interrupted by other threads or interrupts during execution.. The xchgb instruction can be used to implement a simple lock called a spinlock (spinlock).

xchg command

In the x86 architecture processor, there is an instruction called xchgb (exchange byte), which can atomically exchange the value of two byte-sized memory addresses. One of the operands must be a register, and the other operand can be a register or memory address. The execution of the xchgb instruction cannot be interrupted, that is, during its execution, other threads or processes cannot access the memory addresses it involves. In this way, the operation on the mutex can be guaranteed to be atomic, that is, there will be no race conditions.Atomicity means that this instruction will not be interrupted by other instructions during execution, nor will it be disturbed by other processors or buses, which is supported by hardware. The format of the xchgb command is as follows:

xchgb %al, (%ebx)

The meaning of this instruction is to exchange the value in the register al with the value in the memory address ebx, and store the exchanged value back into the register al and the memory address ebx respectively. For example, if the value in the register al is 0x01, and the value in the memory address ebx is 0x00, then after executing this instruction, the value in the register al becomes 0x00, and the value in the memory address ebx becomes 0x01.

Both the xchgb and xchg instructions are used to exchange the values ​​of two operands. Their main difference is the size of the operands.

The xchgb instruction can be used to lock and unlock a mutex. For the mutex variable mutex, it is a byte-sized memory address with an initial value of 0. When mutex is 0, it means that the mutex is free, and when mutex is 1, it means that the mutex is occupied.

Adding and unlocking of spin locks

Below is an example of a simple x86 assembly version of a spinlock. This example xchgbimplements a spinlock using the directive:

spin_lock:
movb $1, %al # 将1放入寄存器al
xchgb %al, (lock) # 交换al和内存中lock变量的值,并将原来的值放入al
testb %al, %al # 测试al是否为0
jnz spin_lock # 如果不为0,说明锁已经被占用,跳回spin_lock继续等待
ret # 如果为0,说明锁已经获得,返回

spin_unlock:
movb $0, (lock) # 将0放入内存中lock变量,释放锁
ret # 返回

In this way, we can use spin_lock and spin_unlock to protect critical sections, shared resources that require mutually exclusive access. For example:

spin_lock # 调用spin_lock获取锁
# ...临界区代码...
spin_unlock # 调用spin_unlock释放锁

Here is an explanation of each instruction in the example above:

  • spin_lockThe function will try to acquire the lock. It xchgbswaps the value of the lock with 1 using the instruction and checks the swapped value. If the value is 0, it means that the thread successfully acquired the lock; otherwise, it will continue to wait and try again.

    • mov al, 1: Move 1 into althe register.
    • xchgb al, [lock]: alExchange the value in the register with the lock value in memory.
    • test al, al: Test alwhether the value in the register is 0.
    • jnz spin_lock: If althe value in the register is not 0, jump to spin_lockthe label, that is, before the first mov al, 1instruction [retry].
    • ret: Returned from a function.
  • spin_unlockFunction is used to release the lock. It sets the value of the lock to 0 so that other threads can acquire the lock.

    • mov byte [lock], 0: Set the lock value in memory to 0.

    • ret: Returned from a function.

In other versions the name of the register may be eax, it doesn't matter.

Add and unlock mutex

The following is a code snippet that uses the xchgb instruction to implement mutex lock and unlock operations, which describes xchgbhow the instruction operates between the CPU and memory to implement a mutex. Suppose we have a byte variable lockwhich is used as a mutex. The initial value is 0, indicating that the lock is not occupied. When a thread tries to acquire a lock, it does the following:

; 加锁操作
mov al, 1
lock xchgb [mutex], al
test al, al
jnz try_again

; 解锁操作
mov [mutex], 0

This fragment of assembly code means:

  1. Move 1 into althe register.

  2. Use xchgbinstructions to alswap values ​​in registers with lock values ​​in memory.

    a. Read the lock value in the memory to the CPU.

    b. Write althe value in the register to the lock location in memory.

    c. Write the read lock value into althe register.

  3. If althe value in the register after the exchange is 0, it means that the lock is successful; otherwise, it means that the lock has failed, and you need to try to lock again. The unlocking operation is very simple, just set the lock value in memory to 0.

Note: try_againnot a keyword for assembly code. It is a label used to mark a location in the code. In the above sample code, jnz try_againthe instruction means that if the result of the previous test al, alinstruction is non-zero, then jump to try_againthe location where the label is and continue execution. In this way, the operation of cyclically trying to lock can be realized.

These operations are atomic, that is, during the entire process, other threads cannot access or modify the lock value in memory. Therefore, when the thread checks the exchanged lock value, it can determine whether it has successfully acquired the lock.

Mutex application

Suppose there is a mutex variable lock whose initial value is 0, indicating that the lock is free. When a thread wants to apply for this lock, it can execute the following assembly code:

movl $1, %eax  # 将1放入寄存器eax
xchgb %al, lock  # 交换eax的低字节和lock的值,并将结果存入lock
testb %al, %al  # 测试eax的低字节是否为0
jnz busy  # 如果不为0,说明锁已经被占用,跳转到busy标签,线程挂起阻塞
		   # 如果为0,说明锁已经成功申请,继续执行临界区代码		

The function of this code is, if the value of the lock is 0, then exchange it with 1, and store 1 in the lock, indicating that the lock has been applied; if the value of the lock is 1, then exchange it with 1, and 1 is stored in the low byte of eax, indicating that the lock is already occupied. Then judge whether the lock is successfully applied for by testing whether the low byte of eax is 0. If it succeeds, it can enter the critical section; if it fails, it needs to wait or retry. Generally, after the thread fails to apply for a lock, it will hang up and block (that is, sleep).

Replenish:

  1. $1Indicates the immediate value (constant or operand) 1, and %eaxthe register eax. The function of this instruction is to move the immediate value 1 to the register eax. Different assembly syntaxes may use different symbols for immediate values ​​and registers. For example, in Intel syntax assembly code, special symbols are usually not used to represent immediate values ​​and registers.

  2. In the x86 architecture, eaxthe register is a 32-bit register, and its lower 8 bits can albe accessed through the register. In this case, we only care about whether the lock value is 0, not the other bits of the lock value.

thread switch

When a thread is switched while applying for a mutex, its context (including register values ​​and program counter values) is saved to memory. When the thread is switched back to continue executing, its context is restored, allowing the thread to continue executing from where it was switched out.

If a thread is switched out after executing xchginstructions, its context (including alregister values) is saved to memory. When the thread is switched back to continue execution, its context is restored and althe values ​​in the registers are restored. In this way, the thread can continue to execute test al, alinstructions, check whether the lock value is 0, and so on.

Each thread has its own set of register values, but these values ​​do not exist independently in the CPU, but are implemented through context switching.

In the eyes of the execution flow, the registers of the CPU are "tools" for saving and switching different thread contexts. The limited registers are shared by all execution flows, but the context of each thread it points to is private to the thread, so in the view of the thread Now, the register is the context of the current execution flow (because the register holds the address of the context).

4.3 The nature of mutex locks

The essence of a mutex is a number, which is unique to a shared resource and is a sign of whether a thread can access a shared resource. Only threads with this flag can operate on shared resources. The atomic command makes the transfer of the mutex safe, so the mutex can also ensure the uniqueness of shared resource data.

Atomic operations are supported at the hardware level. For threads, the two states of atomic operations correspond to the two most meaningful situations for them (assuming there are thread A and thread B):

  1. Nothing is done: thread A has no lock, which means that the other thread failed to apply for the lock, then thread A can apply for the lock by itself;
  2. Just do it: thread A releases the lock, and thread B can apply for the lock.

For the thread holding the lock, other threads cannot compete with it for the lock, which depends on the scheduler; for the thread applying for the lock, if the application fails, it means that it is now competing with other threads for the lock, the scheduler The processor has not yet decided to let it take the lock; both cases are atomic to other threads.

5. Reentrant and thread safe

The distinction between reentrancy and thread safety is a common programming question in a multithreaded environment. Simply put, a reentrant function means that a function can be interrupted during execution, and can be called again after the interruption without affecting the original execution state. A thread-safe function means that a function can be called by multiple threads at the same time without causing data races or logic errors.

5.1 Reentrant functions

For examples of reentrant functions, you can click here .

Reentrant means that a function can be safely called by multiple tasks or threads, even if it is interrupted or switched during the execution of the function, it will not affect the correctness and consistency of the function. Reentrant functions generally follow the following principles:

  • Do not use global variables or static variables, only use local variables or incoming parameters;
  • Do not call malloc(), free() and other functions that may modify the heap;
  • Do not call printf(), scanf() and other functions that may modify standard input and output;
  • Do not call other non-reentrant functions, such as rand(), time(), etc.;
  • If you must access shared resources, such as hardware devices or files, use mutexes or disable interrupts to protect them.

Reentrant functions are very important in multitasking or multithreading environments, especially in interrupt handlers, because interrupts may occur at any time. If the interrupt handler is not reentrant, it may cause data errors or system crashes. Reentrant functions are also conducive to improving the modularity and reusability of programs.

Reentrancy is for functions, if a function is executed by multiple threads, then it is reentrant. For example, in the example of grabbing tickets, the thread function getTickets() is a non-reentrant function because it manipulates global variables.

This is why when we test multi-threaded code, if we do not control it, the symbols printed in the thread function sometimes run to the previous line, which is very confusing. The reason is not only that the scheduling strategy of the scheduler is uncertain, but also because cout , printntf are not reentrant, i.e. they are not thread safe. For threads, the display is a shared resource. Of course, we can lock the output operation, but generally we don't do this, because we use print statements only to display content, not to manipulate data. The security issue is mainly to ensure that the data cannot be modified.

5.2 Thread safety

In Linux, if multiple threads access the same piece of code concurrently, and this code operates on global variables or static variables, there may be thread safety issues without lock protection.

Thread safety issues are usually caused by multiple threads accessing the same piece of data concurrently. If these threads modify the data, they may interfere with each other, causing data inconsistencies or other errors.

To avoid this, locks can be used to protect critical sections. The lock can ensure that only one thread can access the data in the critical section at the same time, thereby avoiding thread safety problems.

5.3 Common thread unsafe situations

  1. Operations on global or static variables: If multiple threads concurrently access the same global or static variable and modify it, thread safety issues may arise.
  2. Use non-thread-safe functions: Some functions (such as strtokand gmtime) may have thread-safety issues when used in a multi-threaded environment. These functions usually have thread-safe alternatives (such as strtok_rand gmtime_r), which should be used whenever possible.
  3. Not using locks correctly: If multiple threads need to access the same piece of data concurrently, then locks should be used to protect the piece of data. Thread safety issues can arise if locks are not used correctly, or if the locks are not granular enough.
  4. Not handling signals correctly: In a multithreaded program, signal handlers should be as simple as possible and avoid manipulating global or static variables. Thread safety issues can arise if signal handlers do not properly handle these issues.

5.4 Common Thread Safety Situations

In fact, it is to avoid thread unsafe situations.

  1. Operate on local variables: Local variables are unique to each thread, so there will be no thread safety issues when multiple threads concurrently access local variables in the same function.
  2. Use thread-safe functions: Some functions (such as strtok_rand gmtime_r) are thread-safe and can be safely used in a multi-threaded environment.
  3. Use locks correctly: If multiple threads need to access the same piece of data concurrently, then locks should be used to protect this piece of data. A program is thread-safe if locks are used correctly and the locks are fine-grained enough.
  4. Handle signals correctly: In a multithreaded program, a program is thread-safe if the signal handler function handles the signal correctly and avoids manipulating global or static variables.

In summary, thread safety is usually achieved by avoiding shared data, using thread-safe functions, using locks correctly, and handling signals correctly.

5.5 Common non-reentrant situations

  1. Use global or static variables: If a function uses global or static variables to hold state, then it is generally not reentrant. This is because global and static variables retain state across calls, possibly affecting the function's result.
  2. Calling non-reentrant functions: If a function calls non-reentrant functions, it is usually not reentrant either. This is because non-reentrant functions may affect global state and thus the results of other functions.

5.6 Common reentrant situations

  1. Use local variables: If a function only uses local variables to hold state, then it is usually reentrant. This is because local variables do not retain state across calls, each call creates a new local variable.
  2. Does not call non-reentrant functions: If a function only calls reentrant functions, then it is usually reentrant as well. This is because reentrant functions do not affect global state and therefore cannot affect the results of other functions.

In summary, reentrancy is usually achieved by avoiding the use of global or static variables, calling only reentrant functions, etc.

5.7 The relationship between reentrancy and thread safety

The difference between reentrancy and thread safety is that reentrancy is only concerned with the behavior inside a single thread, whereas thread safety is concerned with the interaction between multiple threads. A reentrant function is a type of thread-safe function.

Reentrant functions must be thread-safe, and vice versa; functions that are correctly unlocked are thread-safe, but reentrant functions may not be guaranteed. This is because reentrant functions don't use any shared data or global variables, and thus won't be interfered by other threads. Thread-safe functions may use shared data or global variables, but will ensure data consistency and correctness through synchronization mechanisms (such as locks, semaphores, etc.).

For example, the malloc function is a thread-safe but non-reentrant function. It uses a global variable to manage memory allocation, so when multiple threads call it at the same time, it needs to be locked to avoid data competition. But if a thread is interrupted while calling malloc, and the interrupt handler also calls malloc, then there will be a deadlock because the same thread tries to acquire the lock it already holds. So the malloc function is not reentrant.

Another example is the printf function, which is neither thread-safe nor reentrant. It uses a shared buffer to output strings, so when multiple threads call it at the same time, the output may be messed up or lost. And if a thread is interrupted while calling printf, and the interrupt handler also calls printf, it will cause buffer overflow or other errors. So the printf function is neither thread safe nor reentrant.

As another example, a function that uses a static variable to hold state may be thread-safe (if it uses a lock to protect the static variable), but it is not reentrant (because the static variable will maintain state between multiple calls ). Conversely, a function that uses local variables to hold state may be reentrant (since local variables don't retain state across calls), but it's not necessarily thread-safe (if it doesn't handle multithreading correctly for concurrent access).

Writing reentrant and thread-safe functions is a good programming practice that improves program stability and efficiency. In order to implement reentrant and thread-safe functions, we need to follow the following principles:

  • Try to avoid using shared data or global variables, and use local variables or parameter passing.
  • If you must use shared data or global variables, you need to use synchronization mechanisms to protect them and keep locks as short as possible.
  • If you must call other functions from within the interrupt handler, then you need to ensure that these functions are reentrant and do not deadlock or recurse with the main program.
  • If you must output information to the screen or a file, you need to use atomic operations or buffering mechanisms to avoid output confusion or loss.

6. Deadlock

6.1 Concept

Deadlock refers to a phenomenon in which two or more threads wait for each other due to competition for resources during the execution process. If there is no external force, they will not be able to continue to execute. Deadlock usually occurs when multiple threads request multiple resources at the same time. Due to improper resource allocation, the threads wait for each other and cannot continue to execute.

For example, thread A and thread B each own lock a and lock b, but they have to apply for each other's lock because the lock they applied for is already occupied, and finally the code cannot be advanced.

image-20230414113350007

Note: The number of threads includes but is not limited to 2. In actual situations, there may be many locks, which eventually form a loop. In a computer, a deadlock may occur due to one lock, that is, you apply for your own lock. This situation is rare. Generally, the code is wrongly written, just understand it.

[Confidence] This is the code I wrote, is this possible?

  1. There may be more than one lock in the code;
  2. The codes for lock a and lock b may be very far apart, and you may forget that a lock has been added somewhere when you write the code.

6.3 Examples

For example, in the previous thread function for grabbing tickets, if the operation of releasing the lock is accidentally written as applying for a lock, this is a situation where a lock causes a deadlock. If a thread applies for the lock it holds, the thread will never be able to release it. lock, and will cause the threads in the waiting queue to hang all the time. From the terminal point of view, the cursor keeps blinking.

// 线程函数
void* getTickets(void* args)
{
    
    
	ThreadData* td = (ThreadData*)args; 	// 获取参数传递的数据
    // ...
	pthread_mutex_lock(td->_pmtx); 		// 加锁
	// ...
	// pthread_mutex_unlock(td->_pmtx); // 解锁
	pthread_mutex_lock(td->_pmtx); 		// 本来是解锁,写成申请锁
    // ...
}

image-20230414131108843

Use the ps command to view the status of the process:

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-sPFaTHzT-1681489006786)(…/…/…/Application Support/typora-user-images/image-20230414131715204. png)]

Sl+The one in lis lock, indicating that the process is in a deadlock state.

6.4 Blocking, Suspending and Waiting

In multithreaded programming, blocking, suspending, and waiting all mean that a thread temporarily stops executing. Their difference is:

  • Blocking: A thread is blocked while waiting for a condition to be met, such as waiting for an I/O operation to complete or waiting for a lock to be acquired. When the condition is met, the thread automatically resumes execution.
  • Suspended: When a thread is suspended, it does not automatically resume execution, but requires other threads to explicitly wake it up.
  • Waiting: The thread enters the waiting state when waiting for a certain condition to be met, such as calling the wait() method to wait for a certain condition variable. When the condition is met, the thread will wake up and continue execution.

In the implementation of the lock, if a thread tries to acquire a lock that is already occupied, the thread will be blocked and added to the lock waiting queue. When the lock is released, the operating system removes one or more threads from the waiting queue, wakes them up, and lets them continue executing.

In the Linux operating system, threads are called lightweight processes. Both threads and processes are represented by the task_struct structure, they can use the same waiting queue mechanism, and their implementation and usage are basically the same. However, depending on how threads and processes are managed in the operating system, they may use different wait queues.

The CPU is the basis for executing tasks, so for all threads and processes that need to execute tasks, the resources they need are the computing power of the CPU. There are many different waiting queues in the system, and they are waiting for other resources, such as locks, disks, network cards and other resources.

For example, when a process is scheduled by the CPU, the process needs to use the lock resource, but at this time the lock resource is being used by other processes, then the state of the process will change from the R state to some kind of blocking state, such as S state, then the process will be removed from the running waiting queue, and the resource linked to the resource waiting for the lock corresponds to the waiting queue, and the CPU continues to schedule the next process in the running waiting queue. After that, if there are still processes that need to use the resources of this lock, then these processes will also be removed from the running waiting queue, and linked to the resource waiting queue of this lock in turn.

Until the process using the lock has been used up, that is, the resource of the lock is ready, at this time, a process will be woken up from the resource waiting queue of the lock, the state of the process will be changed from the S state to the R state, and it will be relinked To run the waiting queue, when the CPU schedules the process again, the process can use the locked resources.

summary

  • From the perspective of the operating system, blocking, suspending, and waiting all mean that a thread temporarily stops executing. The operating system removes these threads from the CPU scheduling queue to free up CPU time for other ready threads.

  • From the user's perspective, blocking, hanging, and waiting all cause a thread to temporarily stop responding. Users may feel that the program is running slower or stuttering. However, these states are usually temporary, and the thread automatically resumes execution when the condition is met.

"Resources" are not limited to hardware resources and software resources. The essence of a lock is a software resource. When we apply for a lock, the lock may be occupied by other threads. At this time, when other threads apply for the lock again, it will fail, and then it will be placed in the resource waiting queue of this lock.

Since there may be problems in the process of locking and unlocking, why not lock and unlock before and after the thread executes the thread function instead of unlocking in the thread function, which will reduce the probability of problems.

Holding the lock for the entire execution of the thread function can cause performance problems. The purpose of locking is to protect shared resources from being accessed and modified by multiple threads at the same time. If a thread holds the lock throughout its execution, other threads cannot access these shared resources, even if the current thread is not actually using them.

If the length of the critical section is too long, it may cause efficiency problems. For example, if a thread spends a long time inside the critical section, other threads trying to enter the critical section will be blocked, which may result in poor performance and longer response times. Although in the example of ticket grabbing, the critical section is short enough, it will still greatly reduce efficiency. Therefore, it is generally recommended to keep the length of the critical section as short as possible, and only perform necessary operations in the critical section.

Therefore, it is generally recommended to lock the mutex only when it needs to access the shared resource, and unlock it immediately after the access is completed, strictly limiting the length of the critical section. This minimizes locking time and improves the concurrency performance of your program.

6.4 Necessary conditions for deadlock

We know that a deadlock refers to a situation in which a group of processes or threads cannot continue to execute because they are waiting for each other's resources. Deadlock is a serious problem because it can cause the system to degrade or even become unresponsive. Therefore, it is very important to understand the causes and solutions of deadlocks.

In Linux, the occurrence of deadlock needs to meet the following four necessary conditions:

  1. Mutually exclusive conditions: each resource is either allocated to a process or thread, or is available, and cannot be occupied by multiple processes or threads at the same time.
  2. Possession and waiting conditions (request and hold): A process or thread that already holds resources can request new resources without having to release the already held (hold) resources.
  3. Non-preemptive conditions: resources that have been allocated to a process or thread cannot be forcibly taken away by other processes or threads, only the process or thread can voluntarily release them.
  4. Circular waiting condition: There is a collection of processes or threads, each of which is waiting for the resource occupied by the next process or thread, forming a circular chain.

avoid deadlock

Violation of the necessary conditions for deadlock

If any one of these four conditions is not true, then deadlock cannot occur. Therefore, ways to prevent or avoid deadlocks are:

  • Violation of one or more of these four prerequisites:

    • Use semaphores or mutexes to achieve mutually exclusive access to resources, preventing multiple processes or threads from competing for the same resource at the same time.

    • Use banker's algorithm or pre-allocation algorithm to allocate resources to avoid processes or threads requesting new resources while occupying resources, resulting in insufficient resources.

    • Use a priority mechanism or a timeout mechanism to preempt resources, assign different priorities to different types of locks, and acquire locks in order of priority. Avoid low-priority processes or threads occupying resources for a long time and blocking high-priority processes or threads.

    • Use topological sorting or orderly allocation method to allocate resources, avoid the formation of circular waiting chains between processes or threads, and allocate resources at one time. Or cleanly release the lock immediately after accessing the critical resource.

  • Set lock timeout: Set a timeout period for each lock. If the lock cannot be acquired within the timeout period, the acquisition will be abandoned and the acquired lock will be released to avoid the situation that the lock is not released.

  • Use the deadlock detection algorithm: Run the deadlock detection algorithm regularly to detect whether there is a deadlock in the system. If a deadlock is detected, appropriate action is taken to resolve it.

For the time being, we only need to theoretically understand the four necessary conditions for breaking the deadlock, and we will learn other methods in practice.

Use the trylock function

In Linux, trylock is a non-blocking function (Immediately), which is used to try to lock a mutex. If the mutex is not currently locked by any thread, the calling thread locks it. If the mutex is currently locked by another thread, the function fails and returns immediately without blocking. That is to say, before a thread applies for a lock, it will try to release its own lock, which is equivalent to giving up the previously applied lock. After other threads get this lock and finish running, I can apply for this lock again. locked. Therefore, this destroys the second necessary condition for forming a deadlock, which is to let the trylock function give the specified thread a "modest" attitude, and let the other party use the lock first.

For example, in the pthread library, pthread_mutex_trylocka function can be used to attempt to lock a mutex. Returns 0 if the lock was successful; otherwise an error code is returned.

You can view its description through man pthread_mutex_trylock:

image-20230414135154624

This paragraph describes pthread_mutex_trylockthe behavior of the function. It is pthread_mutex_locksimilar to function, but with one important difference: If the mutex is currently locked by any thread (including the current thread), the function will return immediately, without blocking.

Additionally, if the mutex is of type PTHREAD_MUTEX_RECURSIVEand is currently owned by the calling thread, the mutex's lock count will be incremented by 1, and pthread_mutex_trylockthe function will immediately return success.

In short, pthread_mutex_trylockthe function is used to try to lock the mutex, and if the mutex is currently locked, the function will return immediately without blocking. If the mutex is of type PTHREAD_MUTEX_RECURSIVEand is currently owned by the calling thread, the function increments the mutex's lock count and returns success immediately.

7. Thread synchronization

7.1 Leading concepts

Synchronize

As mentioned in point 2, synchronization means that multiple events are executed in a certain order. So for threads, thread synchronization refers to coordinating multiple threads to execute in a certain order to ensure that they can correctly access shared resources. This usually requires the use of some synchronization mechanisms, such as mutexes, semaphores, and condition variables, to control the order of execution between threads.

race condition

A race condition means that in a multi-threaded program, multiple threads access and modify shared resources at the same time, causing the execution result of the program to depend on the scheduling order of the threads. This may cause the program to behave indeterminately, or even produce erroneous results.

In order to avoid race conditions, a synchronization mechanism is needed to coordinate the order of execution among multiple threads. For example, a mutex can be used to protect access to a shared resource until it is accessed. In this way, at any time, only one thread can access the shared resource, thereby avoiding race conditions.

7.2 Import

As far as the example of ticket grabbing is concerned, it is possible for a thread to grab 10,000 tickets at once between locking and unlocking. This may happen, and the priority of this thread may be relatively high. This situation is allowed. , is true, but it is unreasonable. Why do you say it is correct but unreasonable?

For example, Xiao Ming went to a mobile phone store to look at his mobile phone. If he went to the clerk for the first time and said that this mobile phone will be launched next month, Xiao Ming asked again the next day, and on the third day... This is correct, but every time he asks the clerk It is obviously meaningless to take time to deal with Xiao Ming. And there will not be such a serious operation in life, why is this approach correct? Because this is in line with the synchronization mechanism (see the concept above), it can only keep asking because shared resources and locks are being occupied. This example can help us understand the meaning of the thread synchronization mechanism.

So for a thread, if it wants to apply for a lock to access a critical resource every time, the operating system will tell it: "Other threads are busy inside, and stay there (to wait in the queue)." But this thread is a little sick, all the time All places are applying for locks, which is a meaningless operation for threads. This is where thread scheduling is unreasonable when simply locking:

  • If individual threads have a high priority, they can apply for locks every time, but do nothing after applying for locks, and keep applying for and releasing locks meaninglessly, which may cause other threads to not compete for locks for a long time. cause hunger problems.

Locking can ensure that only one thread executes critical section code to access critical resources at the same time, but it cannot guarantee that every thread can access critical resources. So we need a synchronization mechanism to make locking more meaningful, so as to achieve efficient thread synchronization.

Because the purpose of applying for a lock is to access critical resources, access cannot be obtained without a lock, so "applying for a lock" and "applying for access to a critical resource" in the description are equivalent. From a code point of view, they are sequential.

7.3 Thread Synchronization

Thread synchronization usually involves the use of some synchronization mechanisms, such as mutexes in our ticket grabbing example, in addition to semaphores and condition variables, etc., to control the execution order between multiple threads, and the behavior of threads depends on the The synchronization mechanism to use. Take a mutex as an example:

  • Thread application lock failure: When a thread fails to apply for a lock by calling the pthread_mutex_lock function, it will be blocked until other threads release the lock. If the pthread_mutex_trylock function is used, when the lock application fails, the function will return an error code immediately.

  • A thread releases the lock: it wakes up other threads waiting for the lock. For example, when a thread calls the pthread_mutex_unlock function to release the lock, other threads waiting for the lock will wake up and continue to compete to acquire the lock.

The operation and principle of the mutex have been introduced, and the condition variable will be introduced below.

7.4 Condition variables

When applying for a lock to access a critical resource, the premise is that the critical resource exists, so it is necessary to first check whether the critical resource exists. The detection operation itself is to access critical resources. Therefore, to detect critical resources, it must be performed between locking and unlocking (critical section) to ensure the safety of critical resources. The conventional way is to check whether the resource is ready. If the resource is not ready, the lock application will fail. If there is no restriction on the behavior of the thread, it will always apply for and release the lock frequently and perform meaningless operations, so give this Conditions set a flag to indicate whether the condition is ready to limit the behavior of the thread. This flag is called a condition variable.

How to limit the behavior of threads? Linked to the mobile phone store example:

  1. Don't let the thread frequently check whether the critical resource is ready, let it wait;
  2. When the conditions are ready, notify the waiting thread, let it apply for locks and access critical resources.

A condition variable is a mechanism for synchronization using global variables shared between threads. A condition variable is a data description used to describe whether a certain resource is ready.. A condition variable allows one or more threads to wait for a change in some shared state while simultaneously releasing the acquired mutex, thereby giving other threads the opportunity to modify that state. When the shared state changes, one or more waiting threads can be woken up, reacquire the mutex, and continue execution.

There are two main operations on condition variables:

  • Waiting operation: Indicates that a thread waits for the "condition" of the condition variable to be established and hangs. It needs to provide a mutex and a condition variable as parameters.
  • Waiting operation: Indicates that another thread wakes up the thread waiting for the "condition" after the "condition" is established.
    1. Releases the mutex, allowing other threads to access the shared resource.
    2. Block the current thread and add it to the waiting queue of the condition variable.
    3. When the signal is received, wake up the current thread and re-acquire the mutex.
    4. Check if the condition is actually true, if not, repeat the above steps.

The wakeup operation means that a thread notifies other threads that a certain condition has been established, and it needs to provide a condition variable as a parameter. Wake-up operations can be divided into two types: single-shot and broadcast. Single-shot signals wake up only one waiting thread, while broadcast signals wake up all waiting threads. A wakeup operation does not need to hold a mutex, but is usually performed after modifying shared state.

Wake-up operations are also called signal (signal) operations.

in principle

The use of condition variables needs to follow the following principles:

  • Condition variables must be used in conjunction with mutexes to protect the consistency of shared state.
  • Wait operations must be performed while holding the mutex to avoid race conditions.
  • Signal operations can be performed at any time, but are best performed while holding a mutex to avoid false wakeups or missed wakeups.
  • Waiting operations must use a while loop to check conditions in case of spurious wakeups or multiple wakeups.
  • Condition variables must be initialized with the pthread_cond_init function and destroyed with the pthread_cond_destroy function.

Condition variables are a powerful and flexible synchronization tool that can be used to implement various complex scenarios, such as producer-consumer model, reader-writer model, thread pool, etc. When using condition variables, care needs to be taken to set and check conditions correctly, and to properly distribute the responsibilities of signaling and waiting.

cond family of functions

The pthread_cond family of functions is a set of functions for thread synchronization under Linux. They include:

  • pthread_cond_init: Initialize condition variables.
  • pthread_cond_wait: Blocking waits for a condition variable to be met.
  • pthread_cond_signal: Wake up a thread waiting for a condition variable.
  • pthread_cond_broadcast: Wake up all threads waiting for the condition variable.
  • pthread_cond_timedwait: Block and wait for the condition variable to be satisfied until the specified time.
  • pthread_cond_destroy: Destroy condition variables.

The return values ​​of these functions are the same: when the function executes successfully, they all return 0. Any other return value indicates an error.

pthread_cond_init

prototype:

int pthread_cond_init(pthread_cond_t *restrict cond, const pthread_condattr_t *restrict attr);

parameter:

  • cond: The condition variable that needs to be initialized.
  • attr: Initialize the attribute of the condition variable, generally set to NULL/nullptr to indicate the default attribute.

Similar to defining a mutex, calling the pthread_cond_init function to initialize a condition variable is called dynamic allocation. In addition, it can also be statically allocated (generally used globally):

cpthread_cond_t cond = PTHREAD_COND_INITIALIZER; // 它是一个宏

Note: Statically allocated condition variables do not need to be destroyed manually by calling the function.

pthread_cond_destroy

prototype:

int pthread_cond_destroy(pthread_cond_t *cond);

parameter:

  • cond: The condition variable that needs to be destroyed.

pthread_cond_wait

prototype:

int pthread_cond_wait(pthread_cond_t *restrict cond, pthread_mutex_t *restrict mutex);

parameter:

  • cond: The condition variable to wait for.
  • mutex: The mutex corresponding to the critical section where the current thread is located.

pthread_cond_broadcast 和 pthread_cond_signal

prototype:

int pthread_cond_broadcast(pthread_cond_t cond);
int pthread_cond_signal(pthread_cond_t cond);

parameter:

  • cond: Wake up threads waiting under the cond condition variable.

the difference:

  • The pthread_cond_signal function is used to wake up the first thread in the waiting queue.
  • The pthread_cond_broadcast function is used to wake up all threads in the waiting queue.

example

frame

The following example will have multiple threads performing different tasks, and while they are performing tasks, other threads are waiting. An array of function pointers is used to store functions of different threads. Similarly, information passed to threads can be stored in an object. Before locking, first write the framework. The example creates three threads, each thread performs a different task, and uses a function pointer type as a parameter to pass to the thread function:

#include <iostream>
#include <string>
#include <unistd.h>
#include <pthread.h>
using namespace std;

#define THREAD_NUM 3						// 线程数量
typedef void (*func_t)(const string& name); // 定义一个函数指针类型
class ThreadData
{
    
    
public:
	// 构造函数
	ThreadData(const string& tname, func_t func)
	: _tname(tname)
	, _func(func)
	{
    
    }
public:
	string _tname;			// 线程名
	func_t _func;			// 线程函数指针
};
// 线程函数1
void tFunc1(const string& tname)
{
    
    
	while(1) 
	{
    
    
		cout << tname << "正在运行任务A..." << endl;
		sleep(1);
	}
}
// 线程函数2
void tFunc2(const string& tname)
{
    
    
	while(1) 
	{
    
    
		cout << tname << "正在运行任务B..." << endl;
		sleep(1);
	}
}
// 线程函数3
void tFunc3(const string& tname)
{
    
    
	while(1) 
	{
    
    
		cout << tname << "正在运行任务C..." << endl;
		sleep(1);
	}
}
// 跳转函数
void* Entry(void* args)
{
    
    
	ThreadData* td = (ThreadData*)args; 	// 强转获取参数传递的数据
	td->_func(td->_tname); 					// 调用线程函数
	delete td; 								// 销毁数据
	return nullptr;
}
int main()
{
    
    
	pthread_t t[THREAD_NUM];				// 创建线程ID
	func_t f[THREAD_NUM] = {
    
    tFunc1, tFunc2, tFunc3}; 		//保存线程函数地址
	for(int i = 0; i < THREAD_NUM; i++)		
	{
    
    
		string tname = "thread["; 			// 线程名
		tname += to_string(i + 1); tname += "]";
		ThreadData* td = new ThreadData(tname, f[i]); 		// 创建保存数据的对象
		pthread_create(t + i, nullptr, Entry, (void*)td); 	// 创建线程的同时将名字和数据对象传递
	}
	for(int i = 0; i < THREAD_NUM; i++)		// 等待线程
	{
    
    
   		pthread_join(t[i], nullptr);
   		cout << "thread[" << t[i] << "]已退出..." << endl;
	}

    return 0;
}

step:

  1. A function pointer type func_t is defined, which accepts a parameter of type const string& and returns void. This way, we can pass different functions as parameters to the thread function.

  2. A class ThreadData is defined, which is used to encapsulate thread data, including thread name and thread function pointer. It has a constructor that initializes these two member variables.

  3. Three thread functions tFunc1, tFunc2 and tFunc3 are defined, which execute tasks A, B and C respectively, and print out the thread name and task information. Here we use the sleep(1) function to pause each thread for one second to observe the output.

  4. Next, we define a jump function Entry, which is the third parameter of the pthread_create function to start the thread. It gets the passed data. Call the thread function in the object, passing in the thread name as an argument. Finally, it deletes the td object (because it is new in the main function).

    Function address + ()operator is equivalent to calling the function at this address.

  5. In the main function, an array is used to save the thread ID and the address of the thread function. Create three threads in turn in the loop, call the pthread_create function, and pass the thread information to it. In this way, the name and data object are passed to the jump function Entry. If the thread creation fails, print an error message and exit the program. Finally three threads are waiting in the loop.

It should be noted that when using the pthread_create function, the parameters need to be cast to void* type, and then converted back to the original type in the jump function. This has been emphasized before.

image-20230414173955214

However, this program is not perfect, because there is no task specified for the thread function to perform, it can only be terminated manually, it is just a framework.

Mutex, condition variable

When releasing mutexes and condition variables, the order of release should be the reverse of the order of application. In other words, if you apply for a mutex first, and then apply for a condition variable, then when releasing, you should release the condition variable first, and then release the mutex. In other words, resources that are requested first should be released later.

This is done to avoid deadlocks. A deadlock is a situation in which two or more threads are waiting for each other to release resources, preventing them from proceeding. If all threads apply for and release resources in the same order, deadlocks can be avoided. For example:

int main()
{
    
    
    pthread_mutex_t mtx;
    pthread_cond_t cond;
    pthread_mutex_init(&mtx, nullptr);
    pthread_cond_init(&cond, nullptr);
	// ... 
    pthread_mutex_destroy(&mtx);
    pthread_cond_destroy(&cond);

    return 0;
}

Note: Condition variables are often used with mutexes for thread safety.

Extended information

The ThreadData class in the framework no longer meets the requirements, because we have defined condition variables and mutexes. To make each thread constrained by condition variables and mutexes, we must let them see these two things. So it is necessary to expand the information content to be passed to the thread.

typedef void (*func_t)(const string& name, // 定义一个函数指针类型
					   pthread_mutex_t* pmtx, 
					   pthread_cond_t* pcond); 

class ThreadData
{
    
    
public:
	// 构造函数
	ThreadData(const string& tname, func_t func, pthread_mutex_t* pmtx, pthread_cond_t* pcond)
	: _tname(tname)
	, _func(func)
	, _pmtx(pmtx)
	, _pcond(pcond)
	{
    
    }
public:
	string _tname;			// 线程名
	func_t _func;			// 线程函数指针
	pthread_mutex_t* _pmtx; // 互斥锁指针
	pthread_cond_t* _pcond; // 条件变量指针
};

Entry, as the software layer between the main function and the thread function, needs to pass several more parameters; the thread function also needs to use the mutex address and the address of the condition variable.

// 跳转函数
void* Entry(void* args)
{
    
    
    // ...
	td->_func(td->_tname, td->_pmtx, td->_pcond); // 调用线程函数
    // ... 
}

In this way, each thread can obtain the lock in the same memory, and can call different thread functions. It can be set as a global lock here, so you don't have to bother to pass parameters, but if the global variable itself is not well controlled, there will be security issues.

Take one of the thread functions as an example:

void tFunc1(const string& tname, pthread_mutex_t* pmtx, pthread_cond_t* pcond)
{
    
    
    while(1) 
    {
    
    
        pthread_mutex_lock(pmtx);		// 加锁
        pthread_cond_wait(pcond, pmtx);	// 等待条件(失败就进入等待队列)
        cout << tname << "正在运行任务A..." << endl;
        pthread_mutex_unlock(pmtx);		// 解锁
        sleep(1);
    }
}
int main()
{
    
    
	// ...
    ThreadData* td = new ThreadData(tname, f[i], &mtx, &cond); 		// 创建保存数据的对象
    // ...
    return 0;
}

The thread calling the pthread_cond_wait function will be blocked immediately, just like a process from R->S. A blocked thread will initially be placed in a waiting queue. Under the same condition variable, the above code directly blocks each thread from the beginning without restriction. Although the scheduler scheduling strategy is uncertain, when all threads are in the waiting queue, their execution The order has been determined (we know that the queue is FIFO). This execution order is the order in the queue (such as abcd). As long as the task has not been completed, the order in which the subsequent threads are scheduled must be fixed, because the scheduler will only take the thread at the head of the queue to execute the task. This order is determined by the queue. It is determined by a data structure and is not affected by the scheduler scheduling policy.

wake up thread

Condition variable wakeup

In the middle of the creation and waiting logic of the main function, the logic of controlling the thread can be added. For example, use the pthread_cond_signal function to wake up the waiting thread, its parameter is the address of the condition variable, and the function of the condition variable is to specify the condition variable to send the signal.

pthread_cond_signal also returns 1 on success if no threads are blocked waiting.

pthread_cond_signal is called "condition variable signal", and the function of the signal is to wake up, so I am used to calling signal a wakeup.

int main()
{
    
    
	// 创建线程
	sleep(5);
	while(1)
	{
    
    
		cout << "唤醒线程..." << endl;
		pthread_cond_signal(&cond);			// 唤醒线程
		sleep(1);
	}
	// 等待线程
    return 0;
}

The function of sleep(5) is to ensure that after the thread is created, the thread has enough time to execute the pthread_cond_signal function to ensure that all threads are in a waiting state.

The purpose of sleep(1) is to wake up threads rhythmically to better observe phenomena.

output:

image-20230414193107668

The output result is much neater than the last time without locking, and the printed content will not be mixed together. Moreover, threads are scheduled in a certain order.

Why are the first three rounds ABC, followed by CBA?

Although at the beginning each thread is waiting for the condition variable cond to be triggered. In the main function, pthread_cond_signal is used to wake up a thread waiting for cond. This function wakes up the first thread in the waiting queue in FIFO order. However, the awakened thread is not necessarily the first to execute. The above code uses sleep(1) to control the time interval for waking up threads. However, this does not guarantee that each wake-up thread can acquire the mtx lock and execute. If another thread already holds the mtx lock at this time, the awakened thread still needs to wait. Therefore, even if the threads in the waiting queue are woken up in order, the order in which they execute is still indeterminate.

How to ensure that the thread that is awakened must be the thread waiting for the head of the queue?

If you want to ensure that the printed order is always ABCABCABC, you can use a counter to control the execution order of threads. For example, a global variable int turn can be defined and initialized to 0. Then, in each thread function, you can check the value of turn to determine whether it should currently execute.

For example, in tFunc1, you can add a while statement inside the while loop, and only when turn == 0 will it exit the loop and perform the print operation. Similarly, in tFunc2 and tFunc3, add similar while statements that check turn == 1 and turn == 2, respectively.

Access to turn needs to be protected by a mutex and incremented by 1 after each thread function has finished printing. Then, you need to use pthread_cond_broadcast to wake up all threads waiting for the condition variable. This way, each thread executes in a predetermined order.

Here is an example of a modified tFunc1 function:

void tFunc1(const string& tname, pthread_mutex_t* pmtx, pthread_cond_t* pcond)
{
    
    
    while(1) 
    {
    
    
        pthread_mutex_lock(pmtx);
        while(turn != 0)
        {
    
    
            pthread_cond_wait(pcond, pmtx);
        }
        cout << tname << "正在运行任务A..." << endl;
        turn = (turn + 1) % 3;
        pthread_cond_broadcast(pcond);
        pthread_mutex_unlock(pmtx);
        sleep(1);
    }
}

image-20230414200819582

In this example, each thread checks the value of turn before printing. If the value of turn is not equal to the predetermined value, then the thread will wait for the condition variable. When a thread finishes printing, it updates the value of turn and wakes up all threads waiting on the condition variable. This allows other threads to continue executing.

Here is a method to control thread scheduling. In addition, if you want to schedule threads in a fixed order, you can also use a semaphore to control the execution order of threads. A semaphore is a tool for synchronizing multiple threads or processes. It can be used to ensure that multiple threads execute in a predetermined order.

Semaphores will be studied in the next section.

For example, you can define a global variable sem_t sem and use sem_init(&sem, 0, 1) to initialize it in the main function. Then use sem_wait(&sem) in each thread function to wait for the semaphore, and only when the value of the semaphore is greater than 0 can the execution continue. After performing the printing operation, you need to use sem_post (& sem) to release the semaphore so that other threads can continue to execute.

Here is an example of a modified tFunc1 function:

void tFunc1(const string& tname)
{
    
    
    while(1) 
    {
    
    
        sem_wait(&sem);
        cout << tname << "正在运行任务A..." << endl;
        sem_post(&sem);
        sleep(1);
    }
}

In this example, each thread waits on the semaphore before printing. Since the initial value of the semaphore is 1, only one thread can acquire the semaphore and continue executing. Other threads will be blocked until the current thread finishes printing and releases the semaphore.

In this way, it can be guaranteed that the thread waiting for the head of the queue is awakened every time, and they will be executed in the predetermined order.

condition variable broadcast

Below is a modified code example that uses condition variable broadcasting, globally setting a bool flag bit quit, which defaults to false. When the logic of waking up the thread is over, set bool to true, indicating that the thread has finished executing the task and exited.

Then the while condition in the thread function should be changed while(!quit)to indicate that the thread will execute its logic before it ends.

// 为了阅读体验,省略了未修改的部分
// 并省略了tFunc2和tFunc3,它们是类似的。
volatile bool quit = false;

// 线程函数1
void tFunc1(const string& tname, pthread_mutex_t* pmtx, pthread_cond_t* pcond)
{
    
    
    while(!quit) 
    {
    
    
        pthread_mutex_lock(pmtx);		// 加锁
        pthread_cond_wait(pcond, pmtx);	// 等待条件(失败就进入等待队列)
        cout << tname << "正在运行任务A..." << endl;
        pthread_mutex_unlock(pmtx);		// 解锁
        sleep(1);
    }
}
// ...
int main()
{
    
    
	// ...
    for(int i = 0; i < THREAD_NUM; i++)
    {
    
    
        string tname = "thread[";
        tname += to_string(i + 1); tname += "]";
        ThreadData* td = new ThreadData(tname, f[i], &mtx, &cond);
        pthread_create(t + i, nullptr, Entry, (void*)td);
    }

    sleep(5);

    cout << "----线程控制逻辑开始----" << endl;
    int count = 5;
    while(count)
    {
    
    
        cout << "唤醒线程..." << count-- << endl;
        pthread_cond_broadcast(&cond);
        sleep(1);
    }

    cout << "----线程控制逻辑结束----" << endl;
    quit = true;
    
    for(int i = 0; i < THREAD_NUM; i++)
    {
    
    
        pthread_join(t[i], nullptr);
        cout << "thread[" << t[i] << "]已退出..." << endl;
    }
	// ...
    return 0;
}

In this example, in each thread function, ready is incremented by 1 before pthread_cond_wait is called. In the main function, a while loop is used to wait for all threads to enter the waiting queue. When the value of ready is equal to the number of threads, pthread_cond_broadcast will be called to wake up all threads at once.

image-20230414210727667

But it gets stuck after a while, and the order in which threads are scheduled in each round is also different. There may even be different results if you run it several times [depending on the scheduler]:

image-20230414211118603

If you delete the sleep in the thread function:

image-20230414211229331

It looks in order. If you replace pthread_cond_broadcast with pthread_cond_signal:

image-20230414211423795

  1. After switching to pthread_cond_signal, only one statement will be printed each time, which verifies that pthread_cond_broadcast will wake up all threads in the waiting queue at the same time.
  2. When pthread_cond_broadcast is called, all threads waiting on the condition variable will be woken up. However, they do not necessarily execute in a predetermined order. This is because, when a thread is woken up, it still needs to acquire the mutex to continue execution. If another thread already holds the mutex at this time, the awakened thread still needs to wait.

If you want to ensure that threads are executed in a predetermined order, you can use the methods mentioned above to control the execution order of threads. The biggest problem with this code is that no matter what method is used to wake up the threads to perform tasks, even after they finish executing and exit, the program cannot exit. The reason for this problem is that the thread function is imperfect.

For this code:

while(!quit) 
{
    
    
    pthread_mutex_lock(pmtx);		// 加锁
    pthread_cond_wait(pcond, pmtx);	// 等待条件(失败就进入等待队列)
    cout << tname << "正在运行任务A..." << endl;
    pthread_mutex_unlock(pmtx);		// 解锁
    sleep(1);
}

Before calling pthread_cond_wait, you must first check whether the critical resource is ready. The detection action itself is accessing the critical resource ring. If the critical resource is not ready, then the pthread_cond_wait function will be called to make the thread enter the blocked state and enter the waiting queue to wait for wake-up. In other words, pthread_cond_wait must be performed between locking and unlocking, because we stipulate that the critical section is as short as possible and completely contains all codes that access critical resources, so when the pthread_cond_wait function is called, the thread must be in critical resources at this time , because the detection operation itself is in critical resources.

Before applying for a critical resource, the thread does not know what state the critical resource is in, only when it enters the critical resource detection. If it is detected that the resource is not ready, the thread will wait, and it will not keep applying for and releasing locks meaninglessly, because this will reduce efficiency.

Therefore, we can judge whether the critical resource is ready before calling the pthread_cond_wait function according to the specific needs, but it is difficult to find a description here to describe the condition of the critical resource ready, so we can also use a global variable ready instead of the detection operation, Its initial value is false, and the pthread_cond_wait function will be called only when ready is false.

void tFunc1(const string& tname, pthread_mutex_t* pmtx, pthread_cond_t* pcond)
{
    
    
    while(!quit) 
    {
    
    
        pthread_mutex_lock(pmtx);		// 加锁
        if(!ready)						// 等待条件(失败就进入等待队列)
        	pthread_cond_wait(pcond, pmtx);	
        cout << tname << "正在运行任务A..." << endl;
        pthread_mutex_unlock(pmtx);		// 解锁
    }
    sleep(1);
}

In the main function, the initial value of the counter count is 3. When count==1, the value of ready is set to false. At the same time, count is used to wake up all threads at once in the loop using the pthread_cond_wait function to check the phenomenon:

Code for this program:

#include <iostream>
#include <string>
#include <pthread.h>
#include <unistd.h>

using namespace std;

#define THREAD_NUM 3						// 线程数量
typedef void (*func_t)(const string& name, 	// 定义一个函数指针类型
					   pthread_mutex_t* pmtx, 
					   pthread_cond_t* pcond); 

volatile bool quit = false;
volatile bool ready = false;

class ThreadData
{
    
    
public:
	// 构造函数
	ThreadData(const string& tname, func_t func, pthread_mutex_t* pmtx, pthread_cond_t* pcond)
	: _tname(tname)
	, _func(func)
	, _pmtx(pmtx)
	, _pcond(pcond)
	{
    
    }
public:
	string _tname;			// 线程名
	func_t _func;			// 线程函数指针
	pthread_mutex_t* _pmtx; // 互斥锁指针
	pthread_cond_t* _pcond; // 条件变量指针
};
// 线程函数1
void tFunc1(const string& tname, pthread_mutex_t* pmtx, pthread_cond_t* pcond)
{
    
    
    while(!quit) 
    {
    
    
        pthread_mutex_lock(pmtx);		// 加锁
        if(!ready)						// 等待条件(失败就进入等待队列)
        	pthread_cond_wait(pcond, pmtx);	
        cout << tname << "正在运行任务A..." << endl;
        pthread_mutex_unlock(pmtx);		// 解锁
    }
    sleep(1);
}
// 线程函数2
void tFunc2(const string& tname, pthread_mutex_t* pmtx, pthread_cond_t* pcond)
{
    
    
    while(!quit) 
    {
    
    
        pthread_mutex_lock(pmtx);
         if(!ready)	pthread_cond_wait(pcond, pmtx);
        cout << tname << "正在运行任务B..." << endl;
        pthread_mutex_unlock(pmtx);
    }
    sleep(1);

}
// 线程函数3
void tFunc3(const string& tname, pthread_mutex_t* pmtx, pthread_cond_t* pcond)
{
    
    
    while(!quit) 
    {
    
    
        pthread_mutex_lock(pmtx);
         if(!ready)	pthread_cond_wait(pcond, pmtx);
        cout << tname << "正在运行任务C..." << endl;
        pthread_mutex_unlock(pmtx);
    }
    sleep(1);

}
void* Entry(void* args)
{
    
    
    ThreadData* td = (ThreadData*)args;
    td->_func(td->_tname, td->_pmtx, td->_pcond);
    delete td;
    return nullptr;
}

int main()
{
    
    
    pthread_mutex_t mtx;
    pthread_cond_t cond;
    pthread_mutex_init(&mtx, nullptr);
    pthread_cond_init(&cond, nullptr);

    pthread_t t[THREAD_NUM];
    func_t f[THREAD_NUM] = {
    
    tFunc1, tFunc2, tFunc3};
    for(int i = 0; i < THREAD_NUM; i++)
    {
    
    
        string tname = "thread[";
        tname += to_string(i + 1); tname += "]";
        ThreadData* td = new ThreadData(tname, f[i], &mtx, &cond);
        pthread_create(t + i, nullptr, Entry, (void*)td);
    }

    sleep(3);
    cout << "----线程控制逻辑开始----" << endl;
    int count = 3;
    while(count)
    {
    
    
    	if(count == 1) ready = true; 
        cout << "唤醒线程..." << count-- << endl;
        pthread_cond_broadcast(&cond);
        sleep(1);
    }
    cout << "----线程控制逻辑结束----" << endl;

    // pthread_cond_broadcast(&cond);
    quit = true;

    for(int i = 0; i < THREAD_NUM; i++)
    {
    
    
        pthread_join(t[i], nullptr);
        cout << "thread[" << t[i] << "]已退出..." << endl;
        sleep(1);
    }

    pthread_mutex_destroy(&mtx);
    pthread_cond_destroy(&cond);

    return 0;
}

Errata, in the code in GIF, pthread_cond_broadcast is also called outside the loop, which does not affect the result. The purpose of calling pthread_cond_broadcast inside the loop is to observe the behavior of threads that do not call pthread_cond_wait after the resource detection is successful.

Of course, pthread_cond_broadcast can also be replaced by pthread_cond_signal experiment, the result is this:

image-20230415000334683

Guess you like

Origin blog.csdn.net/m0_63312733/article/details/130164414