Today I will tell you about threads in Linux. There are many details in this part of the knowledge, and the length may be longer, but we will clarify each knowledge point step by step. Linux threads are a piece of cake for us!

The concept of threads:

As we all know, every time a concept is created in Linux, the method of first describing and then organizing will be used. For example, we use PCB to describe the process, and we use struct_file to describe the file and manage it with the file descriptor table. But in fact, there is no concept of thread in Linux , and the thread we are talking about here is essentially a lightweight process (we will talk about it in detail later), so Linux does not have its own system call to create a thread, but has a system call to create a thread. Lightweight process! Therefore, if you want to use it in the Linux environment, you must call the pthread library.

The nature of threads:

When a process runs and creates a thread, the OS creates a pcb for the system that is exactly the same as the process. The virtual address space in the pcb is exactly the same as that in the process, so the thread sees the same process as the main process. resources to run. It does not have its own virtual memory, page table, and shares a share with the main process, so the difference between it and a process is that it becomes lighter, so a thread is essentially a lightweight process . Once a thread is created, most of the resources of the thread and process are shared . A thread is a flow of execution within a process .

So when the cpu schedules a process, it sees pcbs one by one, and today's pcb may be a process or a thread, but whether the cpu is a process or a thread, as long as it is scheduled, it will run the code in the pcb and data. So thread is the basic unit of cpu scheduling.

Today we have a new understanding of the process. We used to know that the process = kernel data structure + data code. Today, the process is the basic entity responsible for the allocation of system resources . The process is equivalent to a big family, and the thread is the family members, and the family members work together to make this family better. Threads are also working together to cooperate to complete a task.

Thread scheduling:

Thread creation:

Here is the code to create a batch of threads:

The first parameter is the pid of the thread, the second parameter is filled with nullptr by default, the third parameter is the method called by the thread, and the four parameters are the formal parameters of the calling method.

class ThreadData
{
public:
    int number;
    pthread_t tid;
    char namebuffer[64];
};

void *start_routine(void *args)
{
    // sleep(1);
    // 一个线程如果出现了异常，会影响其他线程吗？会的（健壮性或者鲁棒性较差）
    // 为什么？进程信号，信号是整体发给进程的！

    ThreadData *td = static_cast<ThreadData *>(args); // 安全的进行强制类型转化
    int cnt = 10;
    while (cnt)
    {
        cout << "cnt: " << cnt << " &cnt: " << &cnt << endl; // bug
        cnt--;
        sleep(1);
        cout << "new thread create success, name: " << td->namebuffer << " cnt: " << cnt-- << endl;
       
    }
}


int main()
{

     vector<ThreadData*> threads;
#define NUM 10

     for(int i = 0; i < NUM; i++)
        {
 
            ThreadData *td = new ThreadData();
            td->number = i+1;
            snprintf(td->namebuffer, sizeof(td->namebuffer), "%s:%d", "thread", i+1);
            pthread_create(&td->tid, nullptr, start_routine, td);
            threads.push_back(td);
    
            // pthread_create(&tid, nullptr, start_routine, (void*)"thread one");
            // pthread_create(&tid, nullptr, start_routine, namebuffer);
            // sleep(1);
        }

At this time, some people will be confused, why can't the namebuffer be created outside, and the thread can be created directly in the following way:

pthread_create(&tid, nullptr, start_routine, namebuffer);

Because a batch of threads is created, all resources in the for loop are shared. So when you pass in the namebuffer as a parameter. The thread is fully constructed when the parameters passed to start_routine are complete. Because the priority order of thread running is random , the speed is very fast. So it may be that the previous process has not completely built its own namebuffer, and it will be overwritten by the subsequent process refresh (namebuffer is out of scope and destroyed, when the next thread is created, due to the function stack frame, the used nabuffer Same location as the previous thread used, resource sharing). So using the method of passing in the pointer from new, each thread sees its own space.

In the task, the local variables cnt and td are independent , because each thread has an independent stack. When we print the address of cnt, we find that their respective addresses are different .

Thread exits:

There are two ways for a thread to exit: one is to exit with a return value, and the other is to use a system call.

Then why can't you use exit? We said before that exit exit is the operating system signal to the process and finally kill the process. If it is used, all threads in the process will be killed, and the purpose of single thread exit cannot be achieved.

The return value exit is very simple, we can return a nullptr directly, if you want to pass other values you can do the same:

But if you want to return not a number but a set of numbers, you can also return the address of a class. But the premise is that the class object must be new, because the battle array is destroyed when it goes out of scope, the returned pointer must be a wild pointer.

class ThreadReturn
{
public:
    int exit_code;
    int exit_result;
};


 ThreadReturn * tr = new ThreadReturn();
    tr->exit_code = 1;
    tr->exit_result = 106;
    return &tr;

Another way is to use pthread_exit():

The parameter is the value you want the thread to return when it exits.

Thread cancellation:

Threads can also be canceled, remember: the premise before canceling a thread is that the thread has already started running! !

Thread waits:

Threads need to be waited for just like processes, because they need to get information about thread exits and reclaim the pcb resources of the corresponding threads:

First, let's learn about the pthread_join interface:

The second parameter is used to receive the return value of the thread. It is an output parameter for us to obtain the result of the return value. The following is the specific usage:

Why define a variable on the user stack and pass it to pthread_join? The rationale behind it is as follows:

When a thread executes a task and ends, its return value is to be saved in the pthread library. When we pass in the address of ret above and call pthread_join, the return value is assigned to *ret. In this way, the return value is directly assigned to ret. The following is an example diagram:

Thread separation:

After a thread is separated, it does not need to be joined. If it is joined, an error will be reported:

Now let's talk about the role of thread id:

Processes need to be described and then organized by the operating system, so do threads also need to be like this? The answer is: yes.

When we create a thread, the second parameter is an output parameter, which is a structure, and its function is to describe a thread and perform better management.

We all know that when we use threads, we need to introduce a thread library. The thread library is loaded into the shared area of the virtual address space, and the thread library is the threads we created one by one, and managed in the form of a structure:

So the thread id is the starting position of each thread in the dynamic library, and each thread can be found through the starting address . The thread-independent stack structure is also in the shared area, and is managed by each pthread_attr_t structure, so the respective data of the threads will not affect each other. And the stack of the main thread is in the stack area of virtual memory ! !

So what is the thread-local storage in the above figure? Let's demonstrate with code:

 int num =0;

void* task(void* args)
{

    while(1)
    {

        cout<<"我是一个新线程,num:"<<num<<" "<<"&num:"<<&num<<endl;
        sleep(1);
        ++num;
    }
}

int main()
{

    pthread_t tid;
    int n =pthread_create(&tid,nullptr,task,(void*)"new thread");
    assert(n==0);
    (void)n;

    while(1)
    {
        cout<<"我是一个主线程,num:"<<num<<" "<<"&num:"<<&num<<endl;
        sleep(1);
    }

    return 0;
}

The result of the above code:

Needless to say, because num is a global variable, if one thread modifies the value, another thread will also see the changed value. So the two threads see the same num, then make the following modification (add two bars in front of thread), what will the result be?

The following are the results of the operation:

It can be seen that the two threads see num at different addresses, and the address is much larger than before. The reason is that after adding __thread, the data becomes the local storage of the thread. This value is stored in the shared area. The address of the initialized code area is lower than that of the shared area, so the address becomes larger. Because it is a local storage, the modification of this value by a new thread will not affect other processes.

At this point, the control of the thread is all over. Next, we encapsulate the interface in the following thread library to make it more convenient to use like C++:

#include <iostream>
#include <string>
#include <functional>
#include <pthread.h>
#include <assert.h>
using namespace std;

class Thread;
class Context
{
public:
    Context()
        : _this(nullptr), _args(nullptr)
    {
    }

    ~Context()
    {
    }

public:
    Thread *_this;
    void *_args;
};

class Thread
{

public:
    typedef function<void *(void *)> func_t;

    static void *task(void *args)
    {
        // 静态成员不能访问类内的非静态成员，所以必须将类的上下文传进来
        Context *ctx = static_cast<Context *>(args);
        void *ret = ctx->_this->_func(ctx->_args);
        delete ctx;
        return ret;
    }

    Thread(func_t func, void *args = nullptr, int num = 0)
        : _func(func), _args(args), _num(num)
    {
        char namebuffer[64];
        snprintf(namebuffer, sizeof(namebuffer), "Pthread %d", _num);
        _name = namebuffer;

        Context *text = new Context();
        text->_this = this;
        text->_args = _args;

        // 调用c式的接口识别不出来C++的东西，如_func。
        // pthread_create(&_tid,nullptr,_func,_args);
        int n = pthread_create(&_tid, nullptr, task, text);
        assert(n == 0);
        (void)n;
    }

    void join()
    {
        int n = pthread_join(_tid, nullptr);
        assert(n == 0);
        (void)n;
    }

    ~Thread()
    {
    }

private:
    pthread_t _tid;
    int _num;
    string _name;
    func_t _func;
    void *_args;
};

So in the later study, we can use the thread we wrote, which is more convenient to use:

The explanation of thread control is all over here, and it is not easy to create. Thank you for your support!

Linux Threads (Part 1)