Linux| |Thread

Thread

1. The concept of threads

1.1 What is a thread

  • Thread is a control sequence inside the process

  • All processes should have at least one execution route

 

1.2 Processes and threads

  • Process is the smallest unit of resource allocation

  • The smallest unit of program execution in a thread (scheduling)

  • All process information is shared with threads, including executable program text, program global memory and stack, heap memory, and file descriptors, but threads also have some of their own private data:

    • Thread ID, register, execution stack, errno, signal mask word, scheduling priority

for example:

For example, if the school wants to hold a sports meeting, then it must do all kinds of things, shopping, planning and so on. For shopping, planning these things must be done by different people at the same time.

Schools are like processes, and people doing different things are like threads.

【note】

  • For Linux, there is no concept of threads at all, all are processes.

  • So for the creation of a thread, it is actually creating a new process, and then the PCB of the process and the PCB of the previous process point to the same virtual address space. In this way, data can be shared, and different PCBs perform differently. Code.

  • Process is called lightweight process

  • Thread is created based on process as template

 

1.3 Shared by multiple threads of a process

The threads are all in the same address space, so the Text Segment and Data Segment are shared.

  • If you define a function, it can be called in each thread

  • Define a global variable, which can be accessed in all threads

  • In addition, threads also share:

    • File descriptor table

    • Each signal processing method

      • SIG_IGN (ignored), SIG_DFL (default, generally terminates the process) or user-defined signal processing function

    • Current working directory

    • User id and group id

 

1.4 Advantages of threads

  • The cost of creating a new thread is much smaller than the cost of a process

    • Because creating a process needs to open up a series of operations such as virtual address space, page table, PCB, etc. For threads, resources such as virtual address space and page table have already been developed, and you only need to develop the PCB yourself.

  • Compared with switching between processes, switching between threads requires little work from the operating system

    • For switching between processes, you need to switch virtual address spaces, page tables, etc., but for thread switching, you only need to change the PCB.

  • Threads occupy much smaller resources than processes

    • Most of the threads' resources are shared processes, and only a small part of their own private resources. Such as: thread ID, a set of registers, runtime stack, errno, signal mask word, scheduling priority

  • Can make full use of the number of parallel processors

    • For multiple processors, if there is only one process, then only one processor will be working, and the other processors will be idle, which is a waste of resources for the processors. So if you can divide what a process has to do, divide it into multiple and hand it over to threads to do it, then it will make full use of the multi-processor resources

  • While waiting for the end of the slow I/O operation, the program can perform other tasks

    • For example, for a NetEase Cloud Music, you can download a song and then listen to the song. This is a phenomenon caused by multiple threads.

  • Computing-intensive applications, in order to be able to run on a multi-processor system, the calculation is divided into multiple threads to achieve

    • For computationally intensive applications, in order to improve its computational efficiency, the computational work can be divided into several parts, and then these parts can be handed over to different threads for calculation, and multiple threads can be placed on different processors. Memory calculation can improve efficiency

  • For I/O-intensive applications, in order to improve performance, I/O operations can be overlapped, and threads can wait for different I/O operations at the same time

 

1.5 Disadvantages of threads

  • Performance loss

  • Reduced robustness

    • For a multi-threaded program, the possibility of adverse effects due to small errors in time allocation or sharing of variables that should not be shared is very high.

  • Lack of access control

    • The process is the basic granularity of access control. Calling certain OS functions in a thread will affect the entire process

  • Increased programming difficulty

    • It is much more complicated to write and debug a multi-threaded program than a single-threaded program

 

2. Thread control

For thread control, we must first understand that we are using the POSIX thread library.

  • The names of most functions of the thread library start with "pthread"

  • To use this thread library, the header file phread.h must be introduced

  • Link thread function library is to use command-l pthread

 

 

2.1 Process ID and Thread ID

  • In Linux. The current thread implementation is Native POSIX Thread Libaray, or NPTL for short. In this implementation, the thread is also known as the Light Weighted Process. Each user-mode thread corresponds to this scheduling entity in the kernel and also has its own process descriptor (task_struct structure)

  • Before threads, a process corresponds to a process descriptor in the kernel. (1:1)

  • But with threads, a process corresponds to multiple kernel descriptors. (1: N)

  • But for each thread to call getpid to return the same process ID, how to solve the above problem?

Thread group:

struct task_struct
{
   ...
   pid_t pid;
   tid_t tgid;
   ...
   struct task_struct *group_leader;
   ...
   struct list_head thread_group;
   ...
};
  • Multi-threaded processes are also called thread groups, and each thread in the thread group has a process descriptor (task_struct) corresponding to it in the kernel.

  • The pid in the process descriptor structure corresponds to the thread ID

  • The tgid in the process descriptor corresponds to the ID of the process

User mode System call Corresponding structure of the kernel process descriptor
Thread ID gettid(void) pid_t pid
Process id getpid(void) pid_t tgid

For the thread ID learned now, it is different from the thread ID of the POSIX thread library. The type of the thread ID is pthread_t. The thread ID and the process ID can uniquely represent the thread or process.

  • How to check the ID of a thread?

ps -L

-L option will show

  • LWP: thread ID, which is the return value of the gettid() system call

  • NLWP: the number of threads in the thread group

For thread ID, Linux provides a gettid system call to return the ID, but glibc does not encapsulate the system call for members to use in an open interface.

The thread ID can be obtained in the following ways

#include <sys/syscall.h>
pid_t tid;
tid = syscall(STS_gettid);

The first thread in the thread group is called the main thread in the user mode and the group leader in the kernel. When the kernel creates the first thread, the ID of the thread group is set to the thread of the first thread ID, the group leader pointer points to itself, the process descriptor of the main thread.

There is a thread ID equal to the process ID in the thread group, and this thread is the main thread of the thread group

【note】:

Threads and processes are not the same, the process has a parent process of concept, but in the thread group inside, all the threads are peer relationships

2.2 Thread ID and process address space layout

  • The pthread_create function will generate a thread ID, which is stored in the address pointed to by the first parameter.

  • The thread ID is different from the thread ID mentioned earlier.

    • The previous thread ID belongs to the category of process scheduling. Because a thread is a lightweight process and the smallest unit of the operating system scheduler, a value is required to uniquely represent the thread.

    • The thread ID generated by the pthread_create function belongs to the category of the NPTL thread library. The thread ID is used for other functions of the thread library to use the thread ID to operate the thread

  • NPTL thread library, provides the pthread_self function, you can get the ID of the thread itself

pthread_t pthread_self(void);
  • For the type of pthread_t, in fact, the thread ID of pthread_t is essentially an address in the process address space

As shown:

 

  • mmap: You can map a file or other object into memory. If the file is not the sum of the size of multiple pages, the unused space of the last page will be cleared.

 

3. Create thread

功能:创建一个新的线程
​
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void*), void *arg);
​
参数:
    thread:返回线程的ID
    attr:设置线程的属性,attr为NULL表示使用默认的属性
    start_routine:是一个函数地址,线程启动要执行的函数
    arg:传给线程启动函数的参数
​
返回值:
    成功返回0,失败返回错误码

Error detection:

​ The difference between the pthread function and other functions is that an error returns an error code

​ For pthreads, it also has its own errno variable to support code that uses errno. But for the error of the pthread function, it is generally judged by the return value, because the cost of reading the return value is much smaller than the cost of reading the value of the errno variable in the thread

Example:

#include <iostream>
#include <pthread.h>
#include <unistd.h>
​
void* routine(void* arg)
{
  int i = 10;
  while (i > 0)
  {
    std::cout << (char*)arg << '\n';
    i--;
  }
  return NULL;//线程终止
}
​
int main()
{
​
  pthread_t tid;
  int ret = pthread_create(&tid, NULL, routine, (void*)("thread1"));
  if (ret > 0)
  {
    std::cout << "pthread_create error!" << '\n';
  }
​
  sleep(1);
  int i = 30;
  while (i > 10)
  {
    std::cout << "I am main thread!" << '\n';
    i--;
  }
  std::cout << "main exit!" << '\n';
  return 0;
}

 

3. Thread termination

If you just terminate a thread without terminating the entire process, there are four methods as follows:

  • Return from the thread function.

    • This method is not applicable to the main thread, because the return from the main function is equivalent to calling exit

  • A thread can call pthread_cancel to terminate another thread in the same process

  • The thread can call the pthread_exit function to terminate itself

  • You can call the pthread_kill function to terminate a thread

If any thread calls exit, _exit and Exit, then the entire process will terminate.

 

3.1 pthread_exit

功能:线程终止
​
void pthread_exit(void* retval);
​
参数:
    retval:对于是joinable的线程,是其他线程调用pthread_join函数所得到的输出型参数的返回值
​
返回值:
    无返回值,和进程一样,线程结束无法返回到他的调用者

For retval:

​ This is an output parameter, the information returned by the thread termination is placed in the memory space pointed to by retval

 

3.2 pthread_cancel

功能:向线程发送取消请求
​
int pthread_cancel(pthread_t thread);
​
参数:
    thread:需要取消的线程的ID
    
返回值:
    成功返回0,失败返回错误码

 

3.3 pthread_kill

功能:向一个线程发送一个信号
​
#include <signal.h>
​
int pthread_kill(pthread_t thread, int sig);
​
参数:
    thread:线程ID(pthread_create函数的输出型参数)
    sig:信号的编号
    
返回值:
    成功返回0,失败返回错误码,并且没有信号发送

Example:

#include <iostream>
#include <pthread.h>
#include <unistd.h>
​
pthread_t main_tid;
int retval;
​
void* routine(void* arg)
{
 pthread_cancel(main_tid);
 int i = 100000;
 pthread_detach(pthread_self());//自我分离
 while (1)
 {
   std::cout << (char*)arg << ":" << i <<'\n';
   i--;
 }
 pthread_exit((void*)&retval);
 //return NULL;//线程终止
}
​
int main()
{
 main_tid = pthread_self();
 pthread_t tid;
 int ret = pthread_create(&tid, NULL, routine, (void*)("thread1"));
 if (ret > 0)
 {
   std::cout << "pthread_create error!" << '\n';
 }
​
 sleep(5);
 int i = 30;
 while (i > 10)
 {
   std::cout << "I am main thread!" << '\n';
   i--;
 }
 std::cout << "main exit!" << '\n';
​
 while (1)
 {
   std::cout << "asd" << '\n';
 }
}
  • This program shows that after the main thread exits, the child thread does not necessarily have to exit , because the process does not exit , so the child thread can continue to run.

【note】:

​ The exit of one thread can affect another thread because the key is to see whether the exiting thread has allowed the process to exit . If the process also exits, all other threads must exit.

​ Because threads are process dependent

 

  • For threads

  • When the kernel sends a signal to the thread through the kill command , the kernel will add the signal to the entire thread group by default

  • Therefore, in order to distinguish between the signal sent to the process and the signal sent to the thread, there are two sets of signal_pending in task_struct , one set is common to the thread group , and the other set is for a single thread .

  • When the signal sent by kill is placed in the signal_pengding shared by the thread group , it can be processed by any thread . The signal sent by pthread_kill is placed in the thread's private signal_pending and can only be processed by the thread.

4. Thread waiting and separation

4.1 thread waiting

Why do we need to wait for threads ? (Similar to process)

  • The thread that has exited has not released its space and is still in the address space of the process

  • The new thread created does not go back to reuse the address space just exited

功能:等待线程结束
​
int pthread_join(pthread_t thread, void** retval);
​
参数:
    thread:线程的ID
    retval:他指向一个指针,后者指向线程的
    
返回值:
    成功返回0,失败返回错误码

The thread calling this function will be suspended (blocking wait) until the thread with the ID thread terminates. The thread thread terminates in different ways , and the termination status obtained by pthraed_join is different.

  • If the thread thread returns by return , the unit pointed to by retval stores the return value of the thread thread function

  • If the thread thread is abnormally terminated by another thread calling pthread_cancel , the unit pointed to by rerval stores the constant PTHREA_CANCELED

  • If the thread thread is terminated by calling pthread_exit , the unit pointed to by retval stores the parameters of pthread_exit

  • If you don’t care about the termination status of the thread , you can pass NULL to the retval parameter

  • If the thread thread is terminated by calling pthread_kill , the unit pointed to by retval stores a random value

 

4.2 Separation of threads

  • By default, the created threads are all joinable (combined) , after the thread exits, it needs to be pthread_join operation , otherwise it will not be able to release resources, resulting in memory leakage

  • If you don’t care about the return value of the thread, join is a burden. At this time, we can tell the system to automatically release the thread resources when the thread exits.

int pthread_detach(pthread_t thread);
​
int pthread_detach(pthread_self());

It can be other threads in the thread group to separate the target thread, or the thread can separate itself

【note】:

joinable and separation is conflict, a thread is joinable is neither isolated

#include <iostream>
#include <pthread.h>
#include <unistd.h>
​
void* routine(void* arg)
{
  pthread_detach(pthread_self());
  std::cout << (char*)arg << '\n';
  return NULL;
}
​
int main()
{
  pthread_t tid;
  if (pthread_create(&tid, NULL, routine, (void*)("thread 1")) > 0)
  {
    std::cout << "pthread_create error!" << '\n';
  }
  sleep(1);
​
  int ret = 0;
  if (pthread_join(tid, NULL) == 0)
  {
    std::cout << "wait success!" << '\n';
  }
  else 
  {
    std::cout << "wait error!" << '\n';
    ret = 1;
  }
  
​
  return ret;
}

Output:

thread 1
wait error
  • Explain that pthread_join and pthread_deatach cannot be used at the same time, this is a conflict

 

5. Thread synchronization and mutual exclusion

  • In most cases, the variables used by threads are local variables, and the address space of the variable is in the thread space. In this case, the variable belongs to a single variable, and other threads cannot obtain this variable.

  • But sometimes, many variables need to be shared within threads. Such variables are called shared variables, and the interaction of multiple threads can be completed through data sharing.

  • The concurrent operation of shared variables by multiple threads will cause some problems. That is to say, it is the problem of the error in the data repair?

#include <iostream>
#include <pthread.h>
#include <unistd.h>
​
//出现6的情况是这样的,有可能下面创建线程创建到第六个出错的时候,i已经变成6了,然后对于buyTicket函数再次从内存中取值的时候,所以取到的就是6了
//多个线程来的时候是对这个函数的重入
void* buyTicket(void *arg)
{
  int ticket = 20;
  while (ticket > 0)
  {
    usleep(1000);
    ticket--;
    std::cout << (char*)(arg) << "buy ticket!Have Ticket:" << ticket << std::endl;
  }
  
  return NULL;//线程结束
}
​
int main()
{
  pthread_t tid[6];
  //不能用这种方法创建,因为buyTicket读取的是地址值,就会导致,每次读取的都是最新的i值,不会看到以前的线程了
  //创建了五个子线程
  //for (int i = 1; i < 6; i++)
  //{
  //  if (pthread_create(tid + i, NULL, buyTicket, (void*)(&(i))) > 0)
  //  {
  //    std::cout << "pthread_create error!" << std::endl;
  //  }
​
  //  sleep(1);
  //}
  if (pthread_create(tid + 1, NULL, buyTicket, (void*)("1")) > 0)
  {
    std::cout << "pthread_create error!" << std::cout;
  }
  if (pthread_create(tid + 2, NULL, buyTicket, (void*)("2")) > 0)
  {
    std::cout << "pthread_create error!" << std::cout;
  }
  if (pthread_create(tid + 3, NULL, buyTicket, (void*)("3")) > 0)
  {
    std::cout << "pthread_create error!" << std::cout;
  }
  if (pthread_create(tid + 4, NULL, buyTicket, (void*)("4")) > 0)
  {
    std::cout << "pthread_create error!" << std::cout;
  }
  if (pthread_create(tid + 5, NULL, buyTicket, (void*)("5")) > 0)
  {
    std::cout << "pthread_create error!" << std::cout;
  }
  
  for (int i = 1; i <= 5; i++)
  {
    if (pthread_join(tid[i], NULL) == 0)
    {
      std::cout << i << " thread quit!" << std::endl;
    }
    sleep(1);
  }
  
  return 0;
}
  • After the while judgment condition is true, the code can switch to other processes concurrently

  • For the latter sleep, the process can be suspended, which means that other threads can have sufficient time to enter the critical area, and many threads can enter the critical area.

  • --numItself is not an atomic operation

--The operation is not an atomic operation , but corresponds to three assembly instructions

  • Step 1: Load the variable num into a register in the memory

  • Step 2: Update the value in the register and perform -1 operation

  • Step 3: Write the new value from the register back to the memory space of the variable num

To solve the above problems, three things need to be done:

  • The code must have mutually exclusive behavior: when the code enters the critical area for execution, other processes are not allowed to enter the critical area

  • If multiple threads require the execution of the code in the critical section at the same time, and there is no thread to execute in the critical section at this time, then only one thread can be allowed to enter the critical section

  • If the thread is not executed in the critical section, then the thread cannot prevent other threads from entering the critical section

As shown:

 

5.1 There are two ways to initialize the mutex:

  • Method one: static allocation

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER
  • Method two: dynamic allocation

int pthread_mutex_init(pthread_mutex_t* restrict mutex, const pthread_mutexattr* restrict attr);
​
参数:
    mutex:要初始化的互斥量
    attr:NULL

 

5.2 destroy mutex

Destroy mutex

Need to pay attention to destroying mutex:

  • Mutex initialized with PTHREAD_MUTEX_INITALIZER does not need to be destroyed

  • Do not destroy a locked mutex

  • The mutex that has been destroyed, make sure that no threads will try to lock again

int pthread_mutex_destory(pthread_mutex_t* mutex);

 

5.3 Locking and Unlocking Mutex

int pthread_mutex_lock(pthread_mutex_t* mutex);
int pthread_mutex_unlock(pthread_mutex_t* mutex);
​
返回值:
    返回值为0,失败返回错误号

When calling pthead_lock, you may encounter the following situations:

  • The mutex is in an unlocked state, this function will lock the mutex and return success

  • When the function call is initiated, other threads have locked the mutex, or there are other threads that apply for the mutex at the same time, but there is no competition for the mutex, then the pthread_lock call will be blocked and wait for the mutex to unlock

 

Guess you like

Origin blog.csdn.net/qq_40399012/article/details/84206623