Understanding Linux multithreaded programming from an interview question

linux multi-thread related video analysis:

Analysis of the epoll principle of linux multithreading and the principle and application of reactor
90 minutes to understand the multithreaded network programming model

Preface

Thread? Why do we need threads when we have processes, and what is the difference between them? What are the advantages of using threads? There are also some details of multi-threaded programming, such as how to synchronize and mutual exclusion between threads, which will be introduced in this article.

Interview question: Are you familiar with POSIX multi-threaded programming technology? If familiar, write the program to complete the following functions:

1) There is an int type global variable g_Flag with an initial value of 0;

2) Start thread 1 in the main line, print "this is thread1", and set g_Flag to 1

3) Start thread 2 in the main line scale, print "this is thread2", and set g_Flag to 2

4) Thread program 1 needs to exit after thread 2 can exit

5) The main thread exits when it detects that g_Flag changes from 1 to 2, or from 2 to 1

We started this article with this question, and after the end, everyone will do it. The framework of this article is as follows:

1. Process and thread
2. Reasons for using thread
3. Functions related to thread operation
4. Mutex
between threads 5. Synchronization between threads
6. Final code of test questions

1. Processes and threads

A process is an instance of a program when it is executed, that is, it is a collection of data structures to what extent the program has been executed. From the kernel's point of view, the purpose of a process is to act as the basic unit for allocating system resources (CPU time, memory, etc.).

A thread is an execution flow of a process and is the basic unit of CPU scheduling and dispatch. It is a basic unit that is smaller than a process and can run independently. A process consists of several threads (user programs with many relatively independent execution streams share most of the data structure of the application), and the threads share all the resources owned by the process with other threads belonging to the same process.

"Process-the smallest unit of resource allocation, thread-the smallest unit of program execution"

A process has an independent address space. After a process crashes, it will not affect other processes in the protected mode, and a thread is just a different execution path in a process. The thread has its own stack and local variables, but the thread does not have a separate address space. The death of a thread means the death of the entire process. Therefore, a multi-process program is stronger than a multi-threaded program, but it consumes more resources when switching between processes. Large, the efficiency is worse. But for some concurrent operations that require simultaneous execution and share certain variables, only threads can be used, not processes.

2. Reasons for using threads

From the above, we know the difference between processes and threads. In fact, these differences are the reason why we use threads. In general: the process has an independent address space, and the thread does not have a separate address space (threads in the same process share the address space of the process). (The content below is taken from multithreaded programming under Linux)

One of the reasons for using multithreading is that it is a very "frugal" multitasking operation compared to processes. We know that under the Linux system, starting a new process must be assigned to its independent address space, and a large number of data tables are established to maintain its code segment, stack segment and data segment. This is a kind of "expensive" multitasking. Way of working. However, multiple threads running in a process use the same address space with each other and share most of the data. The space spent to start a thread is much less than the space spent to start a process, and the threads switch between each other The time required is far less than the time required to switch between processes. According to statistics, in general, the cost of a process is about 30 times the cost of a thread. Of course, this data may be quite different on a specific system.

The second reason for using multithreading is the convenient communication mechanism between threads. For different processes, they have independent data spaces, and data transmission can only be done through communication. This method is not only time-consuming, but also very inconvenient. This is not the case with threads. Because the data space is shared between threads in the same process, the data of one thread can be directly used by other threads, which is not only fast, but also convenient. Of course, data sharing also brings other problems. Some variables cannot be modified by two threads at the same time, and data declared as static in some subprograms is more likely to bring catastrophic blows to multithreaded programs. It is the most important point when writing multi-threaded programs.

In addition to the advantages mentioned above, multi-threaded programs, as a multi-tasking and concurrent work method, certainly have the following advantages:

  • Improve application response. This is particularly meaningful for programs with graphical interfaces. When an operation takes a long time, the entire system will wait for this operation. At this time, the program will not respond to keyboard, mouse, and menu operations. Using multithreading technology will take a long time. The operation (time consuming) is placed in a new thread to avoid this awkward situation.
  • Make the multi-CPU system more effective. The operating system will ensure that when the number of threads is not greater than the number of CPUs, different threads run on different CPUs.
  • Improve program structure. A long and complex process can be divided into multiple threads and become several independent or semi-independent running parts. Such a program will facilitate understanding and modification.

=============================

In terms of function calls, process creation uses fork() operation; thread creation uses clone() operation. Master Richard Stevens said this:

  • fork is expensive. Memory is copied from the parent to the child, all descriptors are duplicated in the child, and so on. Current implementations use a technique called copy-on-write, which avoids a copy of the parent’s data space to the child until the child needs its own copy. But, regardless of this optimization, fork is expensive.
  • IPC is required to pass information between the parent and child after the fork. Passing information from the parent to the child before the fork is easy, since the child starts with a copy of the parent’s data space and with a copy of all the parent’s descriptors. But, returning information from the child to the parent takes more work.

Threads help with both problems. Threads are sometimes called lightweight processes since a thread is “lighter weight” than a process. That is, thread creation can be 10–100 times faster than process creation.

All threads within a process share the same global memory. This makes the sharing of information easy between the threads, but along with this simplicity comes the problem of synchronization.

============================
[Article benefits] need C/C++ Linux server architect learning materials plus group 812855908 (data includes C /C++, Linux, golang technology, Nginx, ZeroMQ, MySQL, Redis, fastdfs, MongoDB, ZK, streaming media, CDN, P2P, K8S, Docker, TCP/IP, coroutine, DPDK, ffmpeg, etc.)
Insert picture description here

3. Functions related to thread operations

#include <pthread.h>

int pthread_create(pthread_t *tid, const pthread_attr_t *attr, void *(*func) (void *), void *arg);
int pthread_join (pthread_t tid, void ** status);
pthread_t pthread_self (void);
int pthread_detach (pthread_t tid);
void pthread_exit (void *status);

pthread_create is used to create a thread and returns 0 if it succeeds, otherwise it returns Exxx (a positive number).

  • pthread_t tid: The type of thread id is pthread_t, which is usually an unsigned integer. When pthread_create is called successfully, it is returned by the *tid pointer.
  • const pthread_attr_tattr: Specify the attributes of the created thread, such as thread priority, initial stack size, whether it is a daemon process, etc. You can use NULL to use the default value, usually we use the default value.
  • void *(*func) (void *): The function pointer func, which specifies the function to be executed when a new thread is created.
  • void *arg: The parameter of the function that the thread will execute. If you want to pass multiple parameters, encapsulate them in a structure.

pthread_join is used to wait for a thread to exit, and returns 0 if it succeeds, otherwise it returns Exxx (a positive number).

  • pthread_t tid: Specify the thread ID to wait
  • void **status: If it is not NULL, the return value of the thread is stored in the space pointed to by status (this is why status is a secondary pointer! This type of parameter is also called a "value-result" parameter).

pthread_self is used to return the ID of the current thread.

pthread_detach is used to specify that the thread becomes a detached state, just like the process leaves the terminal and becomes a background process. Return 0 on success, otherwise Exxx (positive number). If the thread becomes detached, all its resources will be released if the thread exits. If it is not in the detached state, the thread must retain its thread ID and exit the state until other threads call pthread_join on it.

The process is similar, this is why when we open the process manager, we find that there are many dead processes! That's why there must be a dead process state.

pthread_exit is used to terminate a thread, and you can specify the return value so that other threads can obtain the return value of the thread through the pthread_join function.

void *status: The return value of the pointer thread termination.
After knowing these functions, we try to complete the question at the beginning of this article:

1) There is an int type global variable g_Flag with an initial value of 0;

2) Start thread 1 in the main line, print "this is thread1", and set g_Flag to 1

3) Start thread 2 in the main line scale, print "this is thread2", and set g_Flag to 2

These 3 points are very simple! ! ! Isn't it calling pthread_create to create a thread. code show as below:

/*
 * 1)有一int型全局变量g_Flag初始值为0;
 *
 * 2)在主线称中起动线程1,打印“this is thread1”,并将g_Flag设置为1
 *
 * 3)在主线称中启动线程2,打印“this is thread2”,并将g_Flag设置为2
 *
 */
#include<stdio.h>
#include<stdlib.h>
#include<pthread.h>
#include<errno.h>
#include<unistd.h>

int g_Flag=0;

void* thread1(void*);
void* thread2(void*);

/*
 * when program is started, a single thread is created, called the initial thread or main thread.
 * Additional threads are created by pthread_create.
 * So we just need to create two thread in main().
 */
int main(int argc, char** argv)
{
    
    
	printf("enter main\n");
	pthread_t tid1, tid2;
	int rc1=0, rc2=0;
	rc2 = pthread_create(&tid2, NULL, thread2, NULL);
	if(rc2 != 0)
		printf("%s: %d\n",__func__, strerror(rc2));

	rc1 = pthread_create(&tid1, NULL, thread1, &tid2);
	if(rc1 != 0)
		printf("%s: %d\n",__func__, strerror(rc1));
	printf("leave main\n");
	exit(0);	
}
/*
 * thread1() will be execute by thread1, after pthread_create()
 * it will set g_Flag = 1;
 */
void* thread1(void* arg)
{
    
    
	printf("enter thread1\n");
	printf("this is thread1, g_Flag: %d, thread id is %u\n",g_Flag, (unsigned int)pthread_self());
	g_Flag = 1;
	printf("this is thread1, g_Flag: %d, thread id is %u\n",g_Flag, (unsigned int)pthread_self());
	printf("leave thread1\n");
	pthread_exit(0);
}

/*
 * thread2() will be execute by thread2, after pthread_create()
 * it will set g_Flag = 2;
 */
void* thread2(void* arg)
{
    
    
	printf("enter thread2\n");
	printf("this is thread2, g_Flag: %d, thread id is %u\n",g_Flag, (unsigned int)pthread_self());
	g_Flag = 2;
	printf("this is thread1, g_Flag: %d, thread id is %u\n",g_Flag, (unsigned int)pthread_self());
	printf("leave thread2\n");
	pthread_exit(0);
}

This completes the three requirements of 1), 2), and 3). The compilation and execution results are as follows:

netsky@ubuntu:~/workspace/pthead_test$ gcc -lpthread test.c

If the functions in the pthread library are used in the program, in addition to #include<pthread.h>, the -lpthread option must be added when compiling.

netsky@ubuntu:~/workspace/pthead_test$ ./a.out 
enter main 
enter thread2 
this is thread2, g_Flag: 0, thread id is 3079588720 
this is thread1, g_Flag: 2, thread id is 3079588720 
leave thread2 
leave main 
enter thread1 
this is thread1, g_Flag: 2, thread id is 3071196016 
this is thread1, g_Flag: 1, thread id is 3071196016 
leave thread1 

But the running result is not necessarily the above, it may be:

netsky@ubuntu:~/workspace/pthead_test$ ./a.out 
enter main 
leave main 
enter thread1 
this is thread1, g_Flag: 0, thread id is 3069176688 
this is thread1, g_Flag: 1, thread id is 3069176688 
leave thread1

or it could be:

netsky@ubuntu:~/workspace/pthead_test$ ./a.out 
enter main 
leave main 

and many more. This is also well understood because it depends on when the main thread main function terminates and whether threads thread1 and thread2 can execute their functions in a hurry. This is also a problem to pay attention to when multi-threaded programming, because it is possible that one thread will affect all other threads in the entire process! If we sleep() for a period of time before the main function exits, we can ensure that thread1 and thread2 have time to execute.

Vampire Bat Attention: Everyone must have noticed that we call pthread_exit before the thread functions thread1() and thread2() are executed. What if I call exit() or return? Try it yourself!

pthread_exit() is used for thread exit. You can specify the return value so that other threads can obtain the return value of the thread through the pthread_join() function.
Return is a function return. Only when the thread function returns, the thread will exit.
Exit is the process exit. If you call exit in the thread function, all functions in the process will exit!

"4) Thread program 1 needs to exit after thread 2 exits." The fourth point is also easy to solve. It is OK to call pthread_join directly before the function of thread1 exits.

4. Mutual exclusion between threads

The above code seems to solve the first 4 requirements of the problem very well, but it is not! ! ! Because g_Flag is a global variable, threads thread1 and thread2 can operate on it at the same time, it needs to be locked to protect it, thread1 and thread2 must mutually exclusive access. Below we introduce how to lock protection-mutex lock.

Mutual exclusion locks: The
use of mutual exclusion locks (mutual exclusion) allows threads to execute in order. Generally, mutexes synchronize multiple threads by ensuring that only one thread executes a critical section of code at a time. Mutex locks can also protect single-threaded code.

The related operation functions of the mutex lock are as follows:

#include <pthread.h> 

int pthread_mutex_lock(pthread_mutex_t * mptr); 
int pthread_mutex_unlock(pthread_mutex_t * mptr); 
//Both return: 0 if OK, positive Exxx value on error

Before operating on critical resources, pthread_mutex_lock needs to be locked first, and pthread_mutex_unlock is unlocked after the operation is completed. And before this, you need to declare a variable of type pthread_mutex_t to be used as a parameter of the first two functions. See section 5 for the specific code.

5. Synchronization between threads

Point 5-The main thread exits when it detects that g_Flag changes from 1 to 2, or from 2 to 1. You need to use thread synchronization technology! Condition variables are required for thread synchronization.

Condition variables:
Use condition variables to block threads atomically until a certain condition is true. Condition variables are always used with mutex locks. The test of the condition is carried out under the protection of a mutual exclusion lock (mutual exclusion).

If the condition is false, the thread usually blocks based on the condition variable and atomically releases the mutex waiting for the condition to change. If another thread changes the condition, the thread may signal the relevant condition variable, causing one or more waiting threads to perform the following actions:

Wake up
Acquire the mutex
again Reevaluate the condition
Condition variables can be used to synchronize threads between processes in the following situations:

The thread is allocated in the writable memory. The
memory is shared by the cooperating process.
"The use of condition variables can block the thread atomically until a certain condition is true." The fifth point can be used. The main thread main function is blocked in Wait for g_Flag to change from 1 to 2, or from 2 to 1. The related functions of condition variables are as follows:

#include <pthread.h>
 
int pthread_cond_wait(pthread_cond_t *cptr, pthread_mutex_t *mptr); 
int pthread_cond_signal(pthread_cond_t *cptr); 
//Both return: 0 if OK, positive Exxx value on error

pthread_cond_wait is used to wait for a specific condition to be true, and pthread_cond_signal is used to notify the blocked thread that a specific condition is true. The caller needs to declare a variable of type pthread_cond_t before the two functions for the parameters of these two functions.

Why is the condition variable always used with the mutex, and the test of the condition is carried out under the protection of the mutex (mutual exclusion)? Because "a certain characteristic condition" is usually a variable shared between multiple threads. The mutex allows this variable to be set and tested in different threads.

Usually, pthread_cond_wait just wakes up a thread waiting for a certain condition variable. If you need to wake up all threads waiting for a condition variable, you need to call:

int pthread_cond_broadcast (pthread_cond_t * cptr);

By default, the blocked thread will wait until a certain condition variable is true. If you want to set the maximum blocking time, you can call:

int pthread_cond_timedwait (pthread_cond_t * cptr, pthread_mutex_t *mptr, const struct timespec *abstime);

If the time is up and the condition variable is not true yet, it still returns, and the return value is ETIME.

6. Final code of test questions

Through the previous introduction, we can easily write the code, as shown below:

/*
 是否熟悉POSIX多线程编程技术?如熟悉,编写程序完成如下功能:
  1)有一int型全局变量g_Flag初始值为0;
  2)在主线称中起动线程1,打印“this is thread1”,并将g_Flag设置为1
  3)在主线称中启动线程2,打印“this is thread2”,并将g_Flag设置为2
  4)线程序1需要在线程2退出后才能退出
  5)主线程在检测到g_Flag从1变为2,或者从2变为1的时候退出
   */
#include<stdio.h>
#include<stdlib.h>
#include<pthread.h>
#include<errno.h>
#include<unistd.h>

typedef void* (*fun)(void*);

int g_Flag=0;
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t cond = PTHREAD_COND_INITIALIZER;

void* thread1(void*);
void* thread2(void*);

/*
 *  when program is started, a single thread is created, called the initial thread or main thread.
 *  Additional threads are created by pthread_create.
 *  So we just need to create two thread in main().
 */

int main(int argc, char** argv)
{
    
    
	printf("enter main\n");
	pthread_t tid1, tid2;
	int rc1=0, rc2=0;
	rc2 = pthread_create(&tid2, NULL, thread2, NULL);
	if(rc2 != 0)
		printf("%s: %d\n",__func__, strerror(rc2));

	rc1 = pthread_create(&tid1, NULL, thread1, &tid2);
	if(rc1 != 0)
		printf("%s: %d\n",__func__, strerror(rc1));

	pthread_cond_wait(&cond, &mutex);
	printf("leave main\n");
	exit(0);	
}

/*
 * thread1() will be execute by thread1, after pthread_create()
 * it will set g_Flag = 1;
 */
void* thread1(void* arg)
{
    
    
	printf("enter thread1\n");
	printf("this is thread1, g_Flag: %d, thread id is %u\n",g_Flag, (unsigned int)pthread_self());
	pthread_mutex_lock(&mutex);
	if(g_Flag == 2)
		pthread_cond_signal(&cond);
	g_Flag = 1;
	printf("this is thread1, g_Flag: %d, thread id is %u\n",g_Flag, (unsigned int)pthread_self());
	pthread_mutex_unlock(&mutex);
	pthread_join(*(pthread_t*)arg, NULL);
	printf("leave thread1\n");
	pthread_exit(0);
}

/*
 * thread2() will be execute by thread2, after pthread_create()
 * it will set g_Flag = 2;
 */
void* thread2(void* arg)
{
    
    
	printf("enter thread2\n");
	printf("this is thread2, g_Flag: %d, thread id is %u\n",g_Flag, (unsigned int)pthread_self());
	pthread_mutex_lock(&mutex);
	if(g_Flag == 1)
		pthread_cond_signal(&cond);
	g_Flag = 2;
	printf("this is thread2, g_Flag: %d, thread id is %u\n",g_Flag, (unsigned int)pthread_self());
	pthread_mutex_unlock(&mutex);
	printf("leave thread2\n");
	pthread_exit(0);
}

Compile and run to get the results that meet the requirements!

Guess you like

Origin blog.csdn.net/qq_40989769/article/details/112788563