Introduction to multi-threading in Linux|Threads|Basic concepts and library functions of processes


Table of contents

1. Thread

1. The concept of thread

Supplementary knowledge points: page table

2. Advantages of threads

3. Disadvantages of threads

4. Thread exception

5. Thread usage

2. The difference and connection between threads and processes

3. Questions about process threads

0.posix thread library

1. Create a thread

About the last two parameters of pthread_create

1. Pass in pointer

2. Pass in the object

2. Thread termination

3. Cancel the thread

4. Thread waiting (waiting for the thread to end)

5. Thread separation

1. Thread library

2. Thread id

3. Thread separation

4. Using threads in C++11 (language-level use)


1. Thread

1. The concept of thread

  • An execution route in a program is called a thread. A more accurate definition is: a thread is a control sequence within a process
  • All processes have at least one thread
  • Threads run inside the process, essentially running in the address space of the process
  • In the Linux system, as seen from the CPU, the process includes the process pcb, address space, page table, etc. The only threads seen are pcb
  • Through the process virtual address space, you can see most of the resources of the process. The process resources are reasonably allocated to each execution flow to form a thread execution flow.

       A thread is an execution branch. The execution granularity is finer than that of a process, and the scheduling cost is lower. The scheduling cost here is that there is no need to switch pcb, cache, etc.

        From a kernel perspective, threads are the basic unit of CPU scheduling, and processes are the basic entities responsible for allocating system resources.

  • For the CPU, it is impossible to distinguish between processes/threads. The CPU contains: arithmetic unit, controller, register, MMU, hardware cache (cache) l1, l2, l3.
  • For a process, there may be multiple threads in it. The OS needs to manage these threads and create a thread control block (TCB). TCB belongs to PCB. This is indeed done in Windows, but in Linux, PCB is used. Simulating tcb reuses the code and structure of the process, making it easier to maintain and more efficient. This also explains why Linux can run uninterrupted. In actual operation, in addition to the os itself, the most frequently used functions of the os system are processes. Linux does not have threads in the true sense.
  • When the process is scheduled, it is identified by pid, and when the thread is scheduled, it is identified by lwp.

Supplementary knowledge points: page table

I have written before that the page table is a kv structure, which contains virtual addresses and addresses mapped to real physical memory, as well as other attributes.

Physical memory is not divided by bytes. If it is divided by bytes, frequent IO operations will inevitably lead to too much addressing, which will lead to too much mechanical movement, so the efficiency is very low, so when the OS interacts with the disk device , is based on blocks. So physical memory is divided into blocks (4KB), and each block is called a page. The essence of memory management is to put specific 4KB blocks (data content) in the disk into a 4KB space in physical memory. OS also has its own management method for physical memory, which stores the attributes of each block.

Why in blocks?

1. When the file system + compiler file is on the disk, it is in blocks (4KB)

2.os+memory: memory is actually managed in memory, also in units of 4kb

3. Characteristics of the locality principle: allows us to load in advance adjacent or nearby data that is accessing the data

We will reduce the number of future IOs by preloading data nearby the data to be accessed. Loading additional data is essentially called data preloading.

Why should we use 4KB as the unit?

1.The basic unit of IO (kernel memory + file system)

2. Improve efficiency by predicting future hits through the principle of locality

The virtual address space has 32 bits. Does the page table need 2^32byte? NO

In fact, the virtual address is divided according to 10+10+12. First, the high 10 bits are found to find the first-level page table (page directory), and kv finds the second-level page table (the middle 10-bit address is in this page table kv ) to find the starting address of the page frame corresponding to the physical memory, and finally use the data address corresponding to the last 12 bits of the virtual address to find the intra-page offset, that is, using the base address + offset method to locate any memory word in the physical memory node location

  • When actually applying for malloc, os only needs to apply for it in the virtual address space. When actually accessing, the process discovers that the kv relationship does not exist, triggers a page fault interrupt, and os will apply or fill the page table and apply Specific physical memory.
  • Look at the following piece of code:
char *s  = "hello";
*s = 'h';

s is a pointer, pointing to the character constant area. Modifying s now will interrupt the process at the language level.

From the kernel perspective: s stores the virtual starting address pointing to the character. When *s is addressed, it needs to convert the virtual address to the physical address, using the MMU+page table method. At this time, the os performs operations on the process. After review, although this address can be found, the operation is illegal. At this time, mmu is abnormal, and os recognizes the exception and converts it into a signal, which is sent to the target process. When the target process changes from kernel mode to user mode, it performs signal processing and terminates the process. .

2. Advantages of threads

  • Creating a new thread is much less expensive than creating a new process
  • Compared with process switching, switching threads requires much less work.
  • Threads occupy much fewer resources than processes
  • Ability to take full advantage of the number of parallel processors available
  • While waiting for the slow I/O to end, the program can perform other computing tasks
  • Computing-intensive applications, such as encryption and decryption, file compression and decompression, etc., can be decomposed into multiple threads for implementation.
  • IO-intensive applications, such as uploading and downloading, need to wait for disk and network bandwidth, and threads can wait for different IO operations at the same time.

3. Disadvantages of threads

  • Performance Penalties   A computationally intensive thread that is rarely blocked by external events often cannot share the same processor with other threads. If the number of compute-intensive threads exceeds the available processors, there may be a large performance loss, where the performance loss refers to the addition of additional synchronization and scheduling overhead, while the available resources remain unchanged.
  • Reduced Robustness Writing multi-threads requires more comprehensive and in-depth consideration. In a multi-threaded program, the possibility of adverse effects due to slight deviations in time allocation or shared variables that should not be shared is very high. In other words, there is a lack of protection between threads.
  • Lack of access control  Process is the basic granularity of access control. Calling certain OS functions in one thread will have an impact on the entire process.

4. Thread exception

  • If division by zero occurs in a single thread, the wild pointer problem will cause the thread to crash, and the process will also crash.
  • The thread is the execution branch of the process. An exception in the thread is similar to an exception in the process, which triggers the signal mechanism and terminates the process. When the process terminates, all threads in the process will exit immediately.

5. Thread usage

  • Reasonable use of multi-threading can improve the execution efficiency of CPU -intensive programs.
  • Reasonable use of multi-threading can improve the user experience of IO- intensive programs (for example, in life, we download development tools while writing code, which is a manifestation of multi-threading)

2. The difference and connection between threads and processes

  • Threads share process data, but also have some of their own data: thread id, a set of registers , stack , errno, signal mask word, scheduling priority
  • Threads share the following process resources and environment: file descriptor table fd , each signal processing method (default or customized), current working directory, user ID and group ID

3. Questions about process threads

The single process we studied before can be regarded as a process with one thread execution flow.

0.posix thread library

  • Thread-related functions pthread_t
  • Use library functions and introduce the header file <pthread.h>
  • To link these thread function libraries in the makefile, use the -lpthread option of the compiler command.

1. Create a thread

int pthread_create(pthread_t *thread,const pthread_attr_t * attr, void *(start_routine)(void *),void * arg);

参数:
thread:返回线程的ID
attr:设置线程属性,一般null为默认属性
start_routine:函数地址,线程启动后要执行的函数
arg:传给线程启动函数的参数

返回值:成功返回0,失败返回错误码

About the last parameter of pthread_create

This parameter is the parameter for the thread to perform the task. The type is (void *). Different methods are executed according to the different parameters passed in.

1. Pass in pointer

//void * threadRun(void * args)
// {
//     const char * name = (const char *)args;
//     int cnt = 5;
//     while(cnt)
//     {
//         cout<<name<<"is running"<< cnt --<<endl;
//         sleep(1);
//     }
//     pthread_exit((void*)11);
// }
// int main()
// {
//     pthread_t tid;
//     pthread_create(&tid,nullptr,threadRun,(void *)"thread 1");
//     void *ret = nullptr;
//     pthread_join(tid,&ret);

//     //linux下int*是几个字节
//     cout<<"new_thread exit "<<(int64_t)ret<<endl;
//     return 0;
// }

2. Pass in the object

enum
{
    OK = 0,
    ERROR;
};
//首先定义一个对象
class ThreadData
{
    public:
        //构造
        ThreadData(const string & name,int id,time_t createtime,int top)
        :_name(name),_id(id),_createTime((uint64_t)createtime),_status(OK),_top(top)
        {}
    
        //析构
        ~ThreadData
        {
        }

    public:  
        //成员变量
        string _name;
        int _id;
        uint64_t _createTime;
        //返回的数据
        int _status;
        int _top;
        int _result;
 };

void * thread_run(void * args)
{
    ThreadData *td = static_cast<ThreadData*>(args);
    
    //执行要做的
    for(int i = 0; i<td->_top;i++)
    {
        td->_result += i;
    }

    cout<<td->_name<<"cal done"<<endl;
    pthread_exit(td);
}

int main()
{
    pthread_t tids[NUM];
    for(int i = 0; i<NUM; i++)
    {
        char tneme[64];
        snprintf(tname,64,"thread -%d", i+1);
        ThreadData * td = new ThreadData(tname,i+1,time(nullptr),5*i); //构建对象
        pthread_create(tid+i,nullptr,thread_run,td);
    }


    //等待所有线程
    for(int i = 0; i<NUM;i++)
    {
        int n = pthread_join(tids[i],&ret);
        if(n!= 0) cerr<<"pthread jion err"<<endl;
        Thread td = static_cast<Thread_Data *> (ret);
        if(td->_status == OK)
            cout<<"result is :"<<td->_result<<endl;
    }

    return 0;
}

2. Thread termination

There are 3 ways to terminate a thread:

  1. Return from the thread function. This method is not applicable to the main thread. Return from the main function is equivalent to exit.
  2. The thread calls pthread_exit to terminate itself
  3. A thread can call pthread_cancel to terminate another thread in the same process
void pthread_exit(void * value_ptr);
参数:
value_ptr不要指向一个局部变量
返回值:
无返回值,跟进程相同,线程结束的时候无法返回到它的调用者

注意:pthread_exit或者return返回的指针所指向的内存单元必须是全局的,或者malloc分配的,不能在线程函数的栈上分配,因为当其他线程得到这个返回指针时,线程函数已经退出了

3. Cancel the thread

int pthread_cancel(pthread_t thread);
//thread:线程id
//成功返回0,失败返回错误码

4. Thread waiting (waiting for the thread to end)

  • The space of a thread that has exited has not been released and is still within the address space of the process (similar to a zombie process)
  • Creating a new thread will not reuse the address space of the thread that just exited.
int pthread_join(pthread_t thread,void ** value_ptr);
//thread 线程Id
//value_ptr:指向一个指针, void* * ,指向线程的返回值
//返回值:成功0,失败返回错误码

The thread calling this function will hang and wait until the thread with id is terminated. Threads terminate in different ways, and the termination status obtained through pthread_join is different

  1. Thread return returns, and the unit pointed to by value_ptr stores the return value of the thread function.
  2. pthread_cancel terminates abnormally and stores PTHREA_CANCLED
  3. Call pthread_exit yourself to terminate and store the parameters in pthread_exit()
  4. Not interested in thread termination status, can pass null

5. Thread separation

1. Thread library

You can see in the path/lib64/libthread-2.17.so that thread is a dynamic library. It is destined to be loaded into memory, and finally mapped into the shared area in the virtual address space by the page table. Threads in the process can access the code and data in the library at any time, so we need to manage the threads and use TCB (thread control block)

2. Thread id

File descriptors were introduced earlier, fd is the subscript in struct file. Various attributes of the thread are also defined in pthread.h, which may include struct_pthread, thread stack, local storage of the thread, etc. All in TCB. At this time, the thread id, which is the starting address of the TCB, is used to identify thread-related attributes.

The thread stack is a private stack. The stack used by the main thread is a public stack and is a system stack in the memory space. The new thread provides the stack in the library.

3. Thread separation

By default, newly created threads are joinable. After the thread exits, you need to perform a pthread_join operation on it, otherwise resources cannot be released, resulting in system memory leaks.

If you don't care about the return value of the thread, join is a burden. At this time, you can tell the system to automatically release the thread resources when the thread exits.

int pthread_detach(pthread_t pthread);

It can be caused by other threads in the thread group detaching the target thread, or it can be caused by the thread itself.

pthread_detach(pthread_self());

Joinable and detached are in conflict, a thread cannot be both joinable and detached. Otherwise it will result in invalid arguments

4. Using threads in C++11 (language-level use)

#include<thread>


void run1()
{
    while(true)
    {
        cout<<"thread 1"<<endl;
        sleep(1);
    }

}

void run2()
{
    while(true)
    {
        cout<<"thread 2"<<endl;
        sleep(2);
    }
}


int main()
{
    thread th1(run1);
    thread th2(run2);

    th1.join();
    th2.join();
    
    return 0;
}

Guess you like

Origin blog.csdn.net/jolly0514/article/details/132641697