Linux System Programming (6): Threads

References

1. Thread concept

1.1 What is a thread

  • thread
    • LWP: light weight process, lightweight process, the essence is still a process (in Linux environment)
    • There is an independent PCB, but no independent address space (shared)
    • smallest execution unit
  • process
    • Independent address space, own PCB
    • The smallest unit of resource allocation can be regarded as a process with only one thread.
  • The difference between process and thread
    • Depends on whether the address space is shared. Living alone (process); sharing (thread). Under Linux:

1.2 Linux kernel thread implementation principle

  • In Unix-like systems, there was no concept of "threads" in the early days. It was only introduced in the 1980s, and the concept of threads was realized with the help of the process mechanism. Therefore, in this type of system, processes and threads are closely related

    • Light-weight processes also have PCBs. The underlying functions used to create threads are the same as processes, which are all clones.
    • From the kernel perspective, processes and threads are the same and have different PCBs, but the three-level page tables pointing to memory resources in the PCB are the same.
    • Processes can transform into threads
    • A thread can be viewed as a collection of registers and stacks
    • Under Linux, threads are the smallest unit of execution, and processes are the smallest unit of allocated resources.
  • View LWP number

    $ ps -Lf pid
    
  • Level 3 mapping

    • Process PCB --> Page directory (can be viewed as an array, the first address is located in the PCB) –> Page table --> Physical page --> Memory unit
    • For processes , the same address (the same virtual address) can be used repeatedly in different processes without conflict. The reason is that although their virtual addresses are the same, their page directories, page tables, and physical pages are different. The same virtual address is mapped to different physical page memory units and ultimately accesses different physical pages.
    • The threads are different . The two threads have independent PCBs, but share the same page directory, which means they share the same page table and physical page , so the two PCBs share an address space.
  • In fact, whether it is fork to create a process or pthread_create to create a thread, the underlying implementation calls the same kernel function clone.

    • If you copy the other party's address space, a process will be generated.
    • If the other party's address space is shared , a thread is generated.
  • The Linux kernel does not distinguish between processes and threads, only at the user level. All thread operation functions pthread_* are library functions

Insert image description here

1.3 Thread shared/non-shared resources

  • Thread shared resources
    • file descriptor table
    • How each signal is processed
    • current working directory
    • User ID and Group ID
    • Memory address space (.text/.data/.bss/heap/shared library)
  • Thread non-shared resources
    • Thread id
    • Processor context and stack pointers (kernel stack)
    • Independent stack space (user space stack)
    • errno variable
    • signal mask word
    • Scheduling priority

1.4 Advantages and Disadvantages of Threads

  • advantage
    • Improve program concurrency
    • Low resource overhead
    • Data communication and data sharing are convenient
  • shortcoming
    • Library function, unstable
    • Debugging, difficulty in writing, not supported by gdb
    • Poor signal support

The advantages are relatively outstanding, and the shortcomings are not flaws. Under Linux, due to the implementation method, the difference between processes and threads is not very big. Threads are used first.

2. Thread control primitives

2.1 pthread_self function

  • Get the thread ID , its function corresponds to the getpid() function in the process
    #include <pthread.h>
    
    // 返回值 成功 0; 失败:无
    pthread_t pthread_self(void);
    // Compile and link with -pthread
    
  • Thread ID: pthread_t type
    • Essence: It is an unsigned integer (%lu) under Linux, and it may be a structure in other systems.
    • Thread ID is the internal identification mark of the process (the thread ID is allowed to be the same between two processes)

The global variable pthread_t tid should not be used to pass out parameters through pthread_create in the child thread to obtain the thread ID. Instead, pthread_self should be used

2.2 pthread_create function

  • Create a new thread , whose function corresponds to the fork() function in the process

    #include <pthread.h>
    
    // 返回值 成功 0; 失败:对应的错误号
    int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void *), void *arg);
    // Compile and link with -pthread
    
    • Parameter 1: Outgoing parameter, save the thread ID assigned by the system
    • Parameter 2: Usually NULL is passed , which means using the thread default attributes. If you want to use specific attributes, you can also modify this parameter
    • Parameter 3: Function pointer, pointing to the main function of the thread (thread body). When the function ends, the thread ends.
    • Parameter 4: Parameters used during execution of the thread main function
  • After calling pthread_create() in a thread to create a new thread , the current thread returns from pthread_create() and continues execution, and the code executed by the new thread is determined by the function pointer start_routine passed to pthread_create.

    • The start_routine function receives a parameter, which is passed to it through the arg parameter of pthread_create. The type of the parameter is void *. The type of interpretation of this pointer is defined by the caller.
    • The return value type of start_routine is also void *, and the meaning of this pointer is also defined by the caller.
    • When start_routine returns, the thread exits. Other threads can call pthread_join to get the return value of start_routine, similar to the parent process calling wait(2) to get the exit status of the child process.
  • After pthread_create returns successfully, the newly created thread ID is filled in to the memory unit pointed to by the thread parameter.

    • The type of process ID is pid_t. The ID of each process is unique in the entire system. You can get the ID of the current process by calling getpid(2), which is a positive integer value.
    • The type of thread ID is pthread_t, which is only guaranteed to be unique in the current process. In different systems, the type pthread_t has different implementations. It may be an integer value, a structure, or an address, so it cannot be simply regarded as Use printf to print integers, and call pthread_self(3) to get the ID of the current thread.
Case 1
  • Create a new thread and print the thread ID
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <unistd.h>
    #include <errno.h>
    #include <pthread.h>
    
    void sys_err(const char *str) {
          
          
        perror(str);
        exit(1);
    }
    
    // 子线程
    void *tfn(void *arg) {
          
          
        printf("thread: pid = %d, tid = %lu\n", getpid(), pthread_self());
    
        return NULL;
    }
    
    // 主线程
    int main(int argc, char *argv[]) {
          
          
        pthread_t tid;
    
        // attr 取 NULL 表示取默认值
        int ret = pthread_create(&tid, NULL, tfn, NULL);
        if (ret != 0) {
          
          
            perror("pthread_create error");
        }
        printf("main: pid = %d, tid = %lu\n", getpid(), pthread_self());
    
        pthread_exit((void *)0);  // 等价下面两行代码,此方法更优
    
        //sleep(1);
    	//return 0;
    }
    
    # pthread 不是 Linux 下的默认的库,链接的时候无法找到 phread 库中函数的入口地址,于是链接会失败
    # 所以在 gcc 编译的时候,要加 -pthread 参数即可解决
    $ gcc pthread_create.c -o pthread_create -pthread
    $ ./pthread_create
    main: pid = 2986, tid = 140380929427264
    thread: pid = 2986, tid = 140380921087744
    
Case 2
  • Create multiple threads in a loop, and each thread prints which thread it is created
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <unistd.h>
    #include <errno.h>
    #include <pthread.h>
    
    void sys_err(const char *str) {
          
          
    	perror(str);
    	exit(1);
    }
    
    void *tfn(void *arg) {
          
          
        // 使用 int 报错:warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
        long i = (long)arg;  // 强制类型转换
        sleep(i);
        printf("I'm %ldth thread: pid = %d, tid= %lu\n", i+1, getpid(), pthread_self());
    
        return NULL;
    }
    
    int main(int argc, char *argv[]) {
          
          
        long i;
        int ret;
        pthread_t tid;
    
        for (i = 0; i < 5; i++) {
          
          
            ret = pthread_create(&tid, NULL, tfn, (void *)i);  // i 传参采用值传递,借助强制类型转换
            if (ret != 0) {
          
          
                sys_err("pthread_create error");
            }
        }
    
        sleep(i);
        printf("I'm main, pid = %d, tid= %lu\n", getpid(), pthread_self());
    
    	return 0;
    }
    
    $ gcc pthread_more.c -o pthread_more -pthread
    $ ./pthread_more 
    I'm 1th thread: pid = 3163, tid = 139852150068992
    I'm 2th thread: pid = 3163, tid = 139852141676288
    I'm 3th thread: pid = 3163, tid = 139852133283584
    I'm 4th thread: pid = 3163, tid = 139851990673152
    I'm 5th thread: pid = 3163, tid = 139852054001408
    I'm main: pid = 3163, tid = 139852158408512
    

2.3 Threads and Sharing

  • Sharing global variables between threads
    • Threads share address spaces such as data segments and code segments by default, and global variables are commonly used. The process does not share global variables and can only use mmap
Case
  • Verify that global data is shared between threads
    #include <stdio.h>
    #include <pthread.h>
    #include <stdlib.h>
    #include <unistd.h>
    
    int var = 100;
    
    void *tfn(void *arg) {
          
          
        var = 200;
        printf("thread, var = %d\n", var);
        
        return NULL;
    }
    
    int main(void) {
          
          
        printf("At first var = %d\n", var);
        
        pthread_t tid;
        pthread_create(&tid, NULL, tfn, NULL);
        sleep(1);
        
        printf("after pthread_create, var = %d\n", var);
        
        return 0;
    }
    
    $ gcc ttt.c -o ttt -pthread
    $ ./ttt 
    At first var = 100
    thread, var = 200
    after pthread_create, var = 200
    

2.4 pthread_exit function

  • Exit a single thread
    #include <pthread.h>
    
    void pthread_exit(void *retval);
    // Compile and link with -pthread
    
    • The parameter retval represents the thread exit status, usually NULL is passed
  • It is forbidden to use the exit() function in threads, which will cause all threads in the process to exit.
    • Instead, use the pthread_exit function to exit a single thread
    • Exit in any thread causes the process to exit. Other threads have not completed their work. When the main control thread exits, it cannot return or exit.

The memory unit pointed to by the pointer returned by pthread_exit or return must be global or allocated using malloc. It cannot be allocated on the stack of the thread function, because the thread function has already exited when other threads get the return pointer.

Case
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <pthread.h>

void sys_err(const char *str) {
    
    
	perror(str);
	exit(1);
}

void func(void) {
    
    
    pthread_exit(NULL);         // 将当前线程退出

    return ;
}

void *tfn(void *arg) {
    
    
    long i = (long)arg;         // 强制类型转换
    sleep(i);

    if (i == 2) {
    
    
        //exit(0);              // 表示退出进程
        //return NULL;          // 表示返回到调用者那里去
        //func();
        pthread_exit(NULL);     // 将当前线程退出
    }
    printf("I'm %ldth thread: pid = %d, tid= %lu\n", i+1, getpid(), pthread_self());

    return NULL;
}

int main(int argc, char *argv[]) {
    
    
    long i;
    int ret;
    pthread_t tid;

    for (i = 0; i < 5; i++) {
    
    
        ret = pthread_create(&tid, NULL, tfn, (void *)i);  // i 传参采用值传递,借助强制类型转换
        if (ret != 0) {
    
    
            sys_err("pthread_create error");
        }
    }

    sleep(i);
    printf("I'm main, pid = %d, tid= %lu\n", getpid(), pthread_self());

    return 0;
}
$ gcc pthread_exit.c -o pthread_exit -pthread
$ ./pthread_exit
I'm 1th thread: pid = 3389, tid = 140125255145216
I'm 2th thread: pid = 3389, tid = 140125246752512
I'm 4th thread: pid = 3389, tid = 140125238359808
I'm 5th thread: pid = 3389, tid = 140125229967104
I'm main: pid = 3389, tid = 140125263484736

in conclusion

  • exit: Exit the process
  • return: Return to the caller
  • pthread exit(): Exit the thread calling this function

2.5 pthread_join function

  • Block waiting for the thread to exit (the main thread waits for the termination of the child thread), and obtains the thread exit status . Its function corresponds to the waitpid() function in the process.

    #include <pthread.h>
    
    // 返回值 成功:0  失败:错误号
    int pthread_join(pthread_t thread, void** retval);
    // Compile and link with -pthread.
    
    • thread: Thread ID (not a pointer)
    • retval: stores thread end status
      • In the process : main return value, exit parameter –>int; wait for the child process to end wait function parameter –>int*
      • In the thread : Thread main function return value, pthread_exit–>void*; Wait for the thread to end pthread_join function parameters-->void**
  • The thread calling this function will hang and wait until the thread with id is terminated. Threads are terminated in different ways, and the termination status obtained through pthread_join is different, summarized as follows

    • If the thread returns through return, the unit pointed to by retval stores the return value of the thread function.
    • If the thread thread is abnormally terminated by calling pthread_cancel from another thread, the constant PTHREAD_CANCELED is stored in the unit pointed to by retval.
    • If the thread thread is terminated by calling pthread_exit itself, the unit pointed to by retval stores the parameters passed to pthread_exit.
    • If you are not interested in the thread's termination status, you can pass NULL to the retval parameter.
Case
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <pthread.h>

struct thrd {
    
    
    int var;
    char str[256];
};

void sys_err(const char *str) {
    
    
	perror(str);
	exit(1);
}
/*
void *tfn(void *arg) {
    struct thrd *tval;

    tval = malloc(sizeof(tval));
    tval->var = 100;
    strcpy(tval->str, "hello thread");

    return (void *)tval;
}
*/
/*
void *tfn(void *arg) {
    // 此处 tval 为局部变量,随函数调用产生而产生,函数调用结束后栈空间就没了,对应的 tval 也没了
    struct thrd tval;              

    tval.var = 100;
    strcpy(tval.str, "hello thread");

    // 局部变量 tval 地址不可做返回值
    return (void *)&tval;
}
*/ 
void *tfn(void *arg) {
    
    
    struct thrd *tval = (struct thrd *)arg;

    tval->var = 100;
    strcpy(tval->str, "hello thread");

    return (void *)tval;
}

int main(int argc, char *argv[]) {
    
    
    pthread_t tid;

    struct thrd arg;
    struct thrd *retval;

    int ret = pthread_create(&tid, NULL, tfn, (void *)&arg);
    if (ret != 0)
        sys_err("pthread_create error");

    // tid 为传入参数,retval 为传出参数
    // 等待线程的结束,并将其返回值赋给 retval
    ret = pthread_join(tid, (void **)&retval);
    if (ret != 0)
        sys_err("pthread_join error");

    printf("child thread exit with var= %d, str= %s\n", retval->var, retval->str);
    
    pthread_exit(NULL);
}

2.5 pthread_detach function

  • Implement thread separation

    #include <pthread.h>
    
    int pthread_detach(pthread_t thread);
    // Compile and link with -pthread
    
  • thread detachment state

    • Specify this state, and the thread actively disconnects from the main control thread. After a thread ends, its exit status is not obtained by other threads, but is directly and automatically released. It is commonly used in networks and multi-threaded servers.
  • If the process has a mechanism similar to thread separation, zombie processes will not be generated.

    • Causes of zombie processes: After the process dies, most resources are released, but residual resources still exist in the system, causing the kernel to think that the process still exists.
  • Generally, after a thread terminates, its termination status is retained until other threads call pthread_join to obtain its status. But the thread can also be set to the detach state. Once such a thread terminates, it will immediately reclaim all the resources it occupies without retaining the termination status.

  • pthread_join cannot be called on a thread that is already in the detach state; such a call will return an EINVAL error. In other words, if pthread_detach has been called on a thread, pthread_join cannot be called again.

Case
  • Use the pthread_detach function to implement thread separation
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <unistd.h>
    #include <errno.h>
    #include <pthread.h>
    
    void *tfn(void *arg) {
          
          
        printf("thread: pid = %d, tid = %lu\n", getpid(), pthread_self());
    
        return NULL;
    }
    
    int main(int argc, char *argv[]) {
          
          
        pthread_t tid;
    
        int ret = pthread_create(&tid, NULL, tfn, NULL);
        if (ret != 0) {
          
          
            fprintf(stderr, "pthread_create error: %s\n", strerror(ret));
            exit(1);
        }
        ret = pthread_detach(tid);              // 设置线程分离` 线程终止,会自动清理pcb,无需回收
        if (ret != 0) {
          
          
            fprintf(stderr, "pthread_detach error: %s\n", strerror(ret));
            exit(1);
        }
    
        sleep(1);
    
        ret = pthread_join(tid, NULL);
        if (ret != 0) {
          
          
            fprintf(stderr, "pthread_join error: %s\n", strerror(ret));
            exit(1);
        }
    
        printf("main: pid = %d, tid = %lu\n", getpid(), pthread_self());
    
        pthread_exit((void *)0);
    }
    
    $ gcc pthread_detach.c -o pthread_detach -pthread
    $ ./pthread_detach 
    thread: pid = 3684, tid = 139762658100992
    pthread_join error : Invalid argument
    

2.6 pthread_cancel function

  • Kill (cancel) the thread , its function corresponds to the kill() function in the process
    #include <pthread.h>
    
    int pthread_cancel(pthread_t thread);
    // Compile and link with -pthread.
    
  • The cancellation of the thread is not real-time, but there is a certain delay, and you need to wait for the thread to reach a certain cancellation point (checkpoint)
    • Similar to playing a game save, you must reach a designated place (save point, such as an inn, warehouse, city, etc.) to save progress. Killing the thread cannot be completed immediately. It must reach the cancellation point.
    • Cancellation point: It is a position where the thread checks whether it has been canceled and takes action as requested.
    • It can be roughly considered that a system call (entering the kernel) is a cancellation point. If there is no cancellation point in the thread, you can set a cancellation point yourself by calling the pthread_testcancel function.
Case
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <stdlib.h>

void *tfn1(void *arg) {
    
    
    printf("thread 1 returning\n");

    return (void *)111;    
}

void *tfn2(void *arg) {
    
    
    printf("thread 2 exiting\n");
    pthread_exit((void *)222);
}

void *tfn3(void *arg) {
    
    
    while (1) {
    
    
        pthread_testcancel();  // 自己添加取消点
    }   

    return (void *)666;
}

int main(void) {
    
    
    pthread_t tid;
    void *tret = NULL;

    pthread_create(&tid, NULL, tfn1, NULL);
    pthread_join(tid, &tret);
    printf("thread 1 exit code = %ld\n\n", (long)tret);

    pthread_create(&tid, NULL, tfn2, NULL);
    pthread_join(tid, &tret);
    printf("thread 2 exit code = %ld\n\n", (long)tret);

    pthread_create(&tid, NULL, tfn3, NULL);
    sleep(3);
    pthread_cancel(tid);
    pthread_join(tid, &tret);
    printf("thread 3 exit code = %ld\n", (long)tret);

    return 0;
}
$ gcc pthread_cancel.c -o pthread_cancel -pthread
$ ./pthread_cancel
thread 1 returning
thread 1 exit code = 111

thread 2 exiting
thread 2 exit code = 222

thread 3 exit code = -1
Terminate thread mode
  • There are three ways to terminate a thread without terminating the entire process
    • Return from the thread main function. This method is not applicable to the main control thread. Return from the main function is equivalent to calling exit.
    • A thread can call pthread_cancel to terminate another thread in the same process
    • A thread can terminate itself by calling pthread_exit

3. Thread attributes

  • The threads discussed before all use the default properties of threads, and the default properties can already solve most of the problems encountered during development. If you have higher requirements for program performance, you need to set thread attributes.

    • For example, you can reduce memory usage by setting the size of the thread stack.
    • Increase the maximum number of threads
  • The attribute value cannot be set directly, and related functions must be used for operation. The initialization function is pthread_attr_init. This function must be called before the pthread_create function, and then the pthread_attr_destroy function must be used to release resources.

3.1 Thread attribute initialization

  • Thread attributes should be initialized first, and then pthread_create creates the thread.
    #include <pthread.h>
    
    // 返回值 成功返回 0,失败返回对应的错误号
    int pthread_attr_init(pthread_attr_t *attr);     // 初始化线程属性
    int pthread_attr_destroy(pthread_attr_t *attr);  // 销毁线程属性所占用的资源
    // Compile and link with -pthread.
    

3.2 Separation state of threads

  • The detachment state of a thread determines how a thread terminates itself
    • Non-detached state : The default attribute of a thread is non-detached state. In this case, the original thread waits for the created thread to end. Only when the pthread _join() function returns, the created thread is terminated and the system resources it occupies can be released.
    • Detached state : The detached thread is not waiting for other threads. When it finishes running, the thread is terminated and system resources are released immediately. You should choose the appropriate separation state according to your own needs
    #include <pthread.h>
    
    // 返回值 成功返回 0,失败返回对应的错误号
    int pthread_attr_setdetachstate(pthread_attr_t *attr, int detachstate);        // 设置线程属性
    int pthread_attr_getdetachstate(const pthread_attr_t *attr, int *detachstate); // 获取线程属性
    // Compile and link with -pthread.
    
    • attr
      • Initialized thread properties
    • detachstate
      • PTHREAD_CREATE_DETACHED (detached thread)
      • PTHREAD_CREATE_JOINABLE (non-detached thread)

3.3 Example of thread attribute control

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <pthread.h>

void *tfn(void *arg) {
    
    
    printf("thread: pid = %d, tid = %lu\n", getpid(), pthread_self());

    return NULL;
}

int main(int argc, char *argv[]) {
    
    
    pthread_t tid;
    pthread_attr_t attr;

    int ret = pthread_attr_init(&attr);
    if (ret != 0) {
    
    
        fprintf(stderr, "attr_init error:%s\n", strerror(ret));
        exit(1);
    }

    ret = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);      // 设置线程属性为分离属性
    if (ret != 0) {
    
    
        fprintf(stderr, "attr_setdetachstate error:%s\n", strerror(ret));
        exit(1);
    }

    ret = pthread_create(&tid, &attr, tfn, NULL);
    if (ret != 0) {
    
    
        perror("pthread_create error");
    }

    ret = pthread_attr_destroy(&attr);
    if (ret != 0) {
    
    
        fprintf(stderr, "attr_destroy error:%s\n", strerror(ret));
        exit(1);
    }

    ret = pthread_join(tid, NULL);
    if (ret != 0) {
    
    
        fprintf(stderr, "pthread_join error:%s\n", strerror(ret));
        exit(1);
    }

    printf("main: pid = %d, tid = %lu\n", getpid(), pthread_self());

    pthread_exit((void *)0);
}
$ gcc pthread_attr.c -o pthread_attr -pthread
$ ./pthread_attr 
pthread_join error:Invalid argument

3.4 Things to note when using threads

  • The main thread exits and other threads do not exit. The main thread should call pthread_exit
  • Avoid zombie threads
    • pthread join
    • pthread detach
    • pthread create specifies detached attributes
    • The thread being joined may release all its memory resources before the join function returns, so the value in the stack of the recycled thread should not be returned.
  • The memory requested by malloc and mmap can be released by other threads
  • You should avoid calling fork in the multi-threaded model unless exec is executed immediately. Only the thread calling fork exists in the child process, and other threads are pthread_exit in the child process.
  • The complex semantics of signals are difficult to coexist with multi-threads, and the introduction of signal mechanisms in multi-threads should be avoided.

Guess you like

Origin blog.csdn.net/qq_42994487/article/details/133350868