C++ multi-threaded system programming essentials

Learning multi-threaded system programming must face two changes in thinking:

1. The current thread may be switched out at any time

2. The sequence of events in multithreading will no longer have a global priority

When the thread is switched back to continue executing the next statement, the global data may have been modified by other threads. For example, if the pointer p is not locked, if(p && p->next){/**/} may cause segfault, because at the moment when the previous branch of the logical AND evaluates to true, p May be set to NULL or released by other threads, the latter branch accessed an illegal address

In a single-cpu system, in theory, we can deduce the actual interleaving operation of multithreading through the order of instructions executed by the cpu. In a multi-core system, multiple threads are executed in parallel, and we don't even have a global clock to number each event. Without proper synchronization, the sequence of events in multiple threads running on multiple CPUs is unpredictable. After proper synchronization is introduced, events will have a sequence.

The correctness of multithreading does not depend on the execution speed of any thread, and it cannot be determined that his event has occurred by waiting for sleep in place, but must be synchronized to allow the current thread to see the execution results of other threads. No matter how fast or slow the thread executes, the program should be able to execute normally.

Let's take a look at this demo and example

bool running = false;

void threadFunc()
{
    while(running)
    {
        //get task from queue
    }
}

void start()
{
    muduo::thread t(threadFunc);
    t.start();
    running = true;
}

When the system load is high, this code will delay the assignment of running, causing the system to exit directly. The correct way is to put running before pthread_create

4.1 Basic thread primitives

There are more than 110 functions of POSIX threads, and only a few are really commonly used

The 11 basic functions are:

2: Thread creation and waiting for the end. 4: creation and destruction of mutex lock and unlock 5: creation and destruction of condition variables waiting for notification broadcast

Multi-threaded tasks can be easily completed with thread mutex and condition.

Some functions can be used as appropriate:

1. pthread_once encapsulates muduo::Singleton. In fact, it is better to use global variables directly.

2. pthread_key* is packaged as muduo::ThreadLocal. Consider replacing it with __thread.

After reading this, I thought about what is pthread_key? I remember this was recorded in "Unix Network Programming", here I am recalling the data pthread_key_create, and look back again at Unix Network Programming Chapter 26 26.5 Thread Specific Data

It is very dangerous for us to bring a non-reentrant function with static variables into multiple threads. This static variable cannot save the value of each thread.

Use thread-specific data. This method is not simple, there are some things that do not need to change the program call sequence, just need to change the code in the function.

Using thread-specific data is an effective way to make existing functions thread-safe functions

Different system requirements support a limited number of thread-specific data. posix requires this limit to be no less than 128.

pthread_create_key creates a key for thread-specific data that we no longer use

In addition to the key, a destructor pointer is also provided.

I ran the demo in the book

#include <vector>
#include <string>
#include <assert.h>
#include <iostream>
#include <zconf.h>
#include <fcntl.h>

static pthread_key_t r1_key;
static pthread_once_t r1_once = PTHREAD_ONCE_INIT;
#define MAXLINE 1024

typedef struct{
    int r1_cnt;
    char *r1_bufptr;
    char r1_buf[MAXLINE];
}Rline;

static void readline_destructor(void* ptr)
{
    free(ptr);
}

static void readline_once(void)
{
    pthread_key_create(&r1_key,readline_destructor);
}

static ssize_t my_read(Rline *tsd,int fd,char* ptr)
{
    if(tsd->r1_cnt <= 0)
    {
        again:
        if((tsd->r1_cnt = read(fd,tsd->r1_buf,MAXLINE)) < 0)
        {
            if(errno == EINTR)
            {
                goto again;
            }
            return (-1);
        }else if(tsd->r1_cnt == 0)
        {
            return 0;
        }
        tsd->r1_bufptr = tsd->r1_buf;
    }

    tsd->r1_cnt--;
    *ptr = *tsd->r1_bufptr++;
    return(1);
}

size_t readline(int fd,void *vptr,size_t maxlen)
{
    ssize_t n,rc;

    char c, *ptr;

    void *tsd;

    pthread_once(&r1_once,readline_once);

    if((tsd = pthread_getspecific(r1_key)) == nullptr)
    {
        tsd = calloc(1,sizeof(Rline));
        pthread_setspecific(r1_key,tsd);
    }

    ptr = (char*)vptr;

    for(n=1;n<maxlen;n++)
    {
        if((rc = my_read((Rline*)tsd, fd, &c)) == 1)
        {
            *ptr++ = c;

            if(c == '\n')
            {
                break;
            }
        }else if(rc == 0)
        {
            *ptr = 0;
            return (n-1);
        }else{
            return -1;
        }
    }

    *ptr = 0;
    return n;
}
int main()
{
    int fd = open("/home/zhanglei/ourc/test/demoParser.y",O_RDWR);
    if(fd <0)
    {
        return -1;
    }
    char buf[BUFSIZ];
    int res = readline(fd,buf,BUFSIZ);
    if(res <0)
    {
        return -1;
    }
    printf("%d\n",res);
    printf("%s\n",buf);
}

Destructor:

Our destructor only releases the memory area allocated by attainments

One-off function:

Our one-off function will be called once by pthread_once, he just creates the key used by readline

The Rline structure contains three variables that cause the aforementioned problems because they are declared as static in Figure 3-18. Each thread that calls readline dynamically allocates a Rline structure by readline, and then releases it by the destructor.

my_read function

The first parameter of this function is now a pointer to the Rline structure allocated by the thread in advance.

Allocate thread-specific data

We first call pthread_once, so that the first thread of the process that calls readlink creates the specific key value of the thread by calling pthread_once

Get a specific data pointer

pthread_getspecific returns a pointer to the Rline structure of the specific thread. However, if this is the first time this thread calls readline, it returns a null pointer. In this case, we allocate a space in the Rline structure, and a calloc initializes the r1_cnt member to 0. Then we call pthread_setspecific to store this pointer for this thread. The next time readline is called, pthread_getspecific will return the pointer that was just stored.

to sum up

After reading here, I have learned the basic usage. pthread_once initializes the thread-specific key, and then obtains the thread-specific data according to the specific key. If not, reset it.

We are looking at the code in muduo, how to apply thread-specific data to practice

Say some points to note. From the above code, we know that in some versions, one of our processes only has a limited amount of specific data. For example, some posix only has 128. The ThreadLocal design in the book is very simple. Take a look at the code. achieve

namespace muduo
{

template<typename T>
class ThreadLocal : noncopyable
{
 public:
  ThreadLocal()
  {
    MCHECK(pthread_key_create(&pkey_, &ThreadLocal::destructor));
  }

  ~ThreadLocal()
  {
    MCHECK(pthread_key_delete(pkey_));
  }

  T& value()
  {
    T* perThreadValue = static_cast<T*>(pthread_getspecific(pkey_));
    if (!perThreadValue)
    {
      T* newObj = new T();
      MCHECK(pthread_setspecific(pkey_, newObj));
      perThreadValue = newObj;
    }
    return *perThreadValue;
  }

 private:

  static void destructor(void *x)
  {
    T* obj = static_cast<T*>(x);
    typedef char T_must_be_complete_type[sizeof(T) == 0 ? -1 : 1];
    T_must_be_complete_type dummy; (void) dummy;
    delete obj;
  }

 private:
  pthread_key_t pkey_;
};

}  // namespace muduo

In the constructor, call pthread_create_key to create the key and bind the destructor, which is used to destroy the key in the destructor, and pthread_key_delete is called. The value function is used to get the value, and the destructor is used to release the memory.

It is not recommended to use:

pthread_rwlock, read and write locks should be used with caution

sem_* semaphore series

The presence of pthread_cancel and pthread_kill means that there is a problem with the program design

I highly recommend this statement, because the performance of each lock is compared in <<unix network programming second volume>>, in the case of adding a memory, the efficiency of the mutex is the highest. The difficulty of C++ multithreaded programming lies in understanding the relationship between library functions and system calls

4.2 Security of c\c++

The emergence of multithreading has brought an impact to traditional programming, such as:

1.errno is no longer a global variable, because different threads may execute different system library functions

2. Some pure functions are not affected, such as malloc\free, printf and fread, fseek and so on.

3. Some functions that use static variables cannot be unaffected. For example, asctime_r, ctime_r, gmtime_r, stderror_r and stock_r can be used

4. The traditional fork model is no longer suitable for multithreading

4.3 Thread ID on Linux

POSIX provides the pthread_self function to return the current process identifier, and its type is pthread_t. pthread_t is not necessarily a numeric type, it may also be a structure, so pthreads specifically provides the pthread_equal function to compare whether two thread identifiers are equal

But this will bring some problems, including:

1. Cannot print pthread_t because the exact type is not known. It can't express his thread id

2. It cannot compare the size of pthread_t or calculate its value, so it cannot be used as the key of the associative container

3. Cannot define an illegal pthread_t to indicate that the thread does not exist

4. pthread_t is meaningful in the process and cannot establish an effective association with the operating system's scheduling.

In addition, glibc's pthreads actually uses pthread_t as a structure pointer, and this memory block is easily reused

So pthread_t is not suitable as a thread identifier

It is recommended to use the gettid system call as the id of the thread on Linux

1. The returned type is pid_t for easy output in the log

2. In modern systems, it represents the specific task id scheduled by the kernel, so it can be easily found under /proc/tid or /proc/pid/task/tid

3. It is globally unique at any time, and because linux allocates a new pid by increasing the cycle, starting multiple threads in a short time will also feel that there are different thread ids

4.0 is an illegal value, because the pid of the first process init of the operating system is unique, of course the ubuntu and the system are systemd

glibc does not provide this function, we have to write it ourselves, let’s take a look at the core code of how muduo is implemented

#include <sys/syscall.h>
::syscall(SYS_gettid)

Of course, muduo has cached in order to improve efficiency

4.4 Thread creation and destruction

Thread creation and destruction are basic elements. Thread creation is much easier than destruction. You only need to follow the following principles:

1. The library should not create its own background thread without prior notification

2. Try to create threads in the same way

3. The thread should not be started before entering the main function

4. The creation of threads in the program is best completed in the initialization phase.

Let's talk about these points separately:

The number of concurrent threads that a process can create is limited by the size of the address space and the parameters of the kernel, and the number of threads that a machine can parallel is limited by the number of CPUs. Therefore, we need to design carefully when designing the number of threads, especially to set the number of threads according to the number of CPUs, and leave enough computer resources for critical tasks.

Another reason is that if there is more than one thread in a thread, it is difficult to ensure that the fork does not cause problems.

So when I wrote the program, it was like this. Fork did not call pthread_create to create threads before.

Ideally, the threads of the program are created with the same class, so that it is easy to do some unified bookkeeping work during the startup and destruction phase of the program. For example, if you call muduo::CurrentThread::tid() to cache the thread id, you won’t be trapped in the kernel after obtaining the thread id. You can also count how many threads are active in the current process, how many threads are created in total, and what is the purpose of each thread. We can name the thread by class, or we can use a singleton to create a threadManager to manage the currently active threads to facilitate debugging.

But this is not always possible. The third-party library will start its own wild thread, so he must check whether his thread id is valid every time, instead of assuming that the thread id has been cached and returned directly. If the library provides asynchronous callbacks, be sure to explain which threads call the asynchronous callback functions provided by the user so that the user knows whether the time-consuming operation can be performed, and whether it will block the execution of other tasks.

Do not start the thread before the main function, because this will affect the safe construction of the global object. We know that C++ has completed the global initialization construction before entering the main function. At the same time, the order of construction between each compiled object is uncertain. In any case, the construction of these global objects is orderly, and they are all completed in the main function in turn, without considering thread safety issues. But if a global object uses threads, it is dangerous, because it breaks the initialization assumption of the global object. In case a thread accesses uninitialized global variables, this kind of obscure error is very difficult to find. If a library wants to create a thread, it must do it after entering the main function

The number of threads created is related to the cpu. Do not create threads for a link. The creation of threads is best in the initialization phase, so that the cost is about 1/10 less than the cost of frequently creating and destroying threads.

There are centralized ways to destroy threads:

1. Natural death returns from the thread main function, and the thread exits normally

2. Abnormal death main function throws an exception, or triggers a segfault signal

3. Suicide pthread_exit

4. Killing calls pthread_cancel to force the thread to terminate

The only way for threads to terminate normally is natural death. Any idea of termination and termination from the outside is wrong. pthread_cancel gives the thread no chance to clean up resources. There is also no chance to release an already held lock.

If you really want to consider terminating an IO task that takes a long time, but don’t want to periodically check a certain global variable, you can consider fork a part of the code to a new process. Kill (2) a process is better than killing the one in the process. The thread is much safer, the communication between the fork process and this process can consider pipe or socketpair or tcp

An important principle in the book is that the life cycle of an object is generally longer than the life cycle of a thread.

to sum up:

这一段内容说的是尽量不要从外部杀死线程，最好做到线程的自然死亡，线程的创建要在main之后，还有的时候我们要考虑第三方库的野生线程造成的安全问题。
一个重要的原则就是线程的生命周期必须要短于线程的生命周期。

4.4.2 exit(3) is not thread-safe

The function of exit(3) in C++ is not only to terminate, but also to deconstruct global objects and static objects of functions that have been constructed. This is the possibility of a potential deadlock, consider the following example

#include <vector>
#include <string>
#include <assert.h>
#include <iostream>
#include <zconf.h>
#include <fcntl.h>
#include <syscall.h>

class noncopyable{
protected:
    noncopyable() = default;
    ~noncopyable() = default;

private:
    noncopyable(const noncopyable&) = delete;
    const noncopyable& operator=( const noncopyable& ) = delete;
};


class MutexLock :public noncopyable{
public:
    MutexLock()
    {
        pthread_mutexattr_init(&mutexattr);
        pthread_mutex_init(&mutex, nullptr);
    }

    MutexLock(int type)
    {
        int res;
        pthread_mutexattr_init(&mutexattr);
        res = pthread_mutexattr_settype(&mutexattr,type);
        pthread_mutex_init(&mutex, &mutexattr);
    }

    ~MutexLock()
    {
        pthread_mutex_destroy(&mutex);
    }

    int lock()
    {
        int res = pthread_mutex_lock(&mutex);
        return res;
    }

    void unLock()
    {
        pthread_mutexattr_destroy(&mutexattr);
        pthread_mutex_unlock(&mutex);
    }

    pthread_mutex_t* getMutex()
    {
        return &mutex;
    }
private:
    pthread_mutex_t mutex;
    pthread_mutexattr_t mutexattr;
};

class MutexLockGuard
{
public:
    MutexLockGuard(MutexLock & mutex)
            : _mutex(mutex)
    {
        _mutex.lock();
    }

    ~MutexLockGuard()
    {
        _mutex.unLock();
    }

private:
    MutexLock & _mutex;
};

void someFunctionMayCallExit()
{
    exit(1);
}

class GlobalObject
{
public:
    void doit()
    {
        MutexLockGuard lock(mutex_);
        someFunctionMayCallExit();
    }

    ~GlobalObject()
    {
        printf("GlobalObject:~GlobalObject\n");
        MutexLockGuard lock(mutex_);
        printf("GlobalObject:~GlobalObject cleaning\n");
    }

private:
    MutexLock mutex_;
};

GlobalObject g_obj;

int main()
{
    g_obj.doit();
}

This example is a very interesting program. When we used this program, a deadlock occurred. After naming exit, the program should have exited normally, but it did not, but deadlocked!

We are thinking here, why is it deadlock?

After calling exit in doit, the global destructor ~GlobalObject() was triggered. He tried to lock the mutex, but at this time the mutex was locked, which caused a deadlock.

Let's give another example of calling a pure virtual function to cause the program to crash. If there is a strategy base class, we will use different stateless strategies at runtime according to the situation. Since the strategy is stateless, derived objects can be shared without having to create a new one every time. Taking the calendar base class and holidays in different countries as examples, the factory function returns a reference to a global object instead of creating a new derived class object every time.

The above program destructs the global object when we exit, and it will hang when another thread calls isHoliday.

If one thread calls exit and destroys the global object Date, a core dump will appear when another thread calls isHoliday

It can be seen that exit in the scene is not an easy task, we need to carefully design the order of the destructor to prevent the problem of object failure caused by each thread access.

4.5 Make good use of the __thread keyword

__thread is the internal storage facility of gcc, which is faster than pthread_key_t. The storage efficiency of __thread can be compared with global variables

int g_var;
__thread int t_var;

void foo()
{
    g_var = 1;
    t_var = 2;
}

__thread cannot be used to modify the class type, it can only be used to modify the POD object. The POD object in the book refers to

POD全称Plain Old Data。通俗的讲，一个类或结构体通过二进制拷贝后还能保持其数据不变，那么它就是一个POD类型。

标准布局的定义
1.所有非静态成员有相同的访问权限

2.继承树中最多只能有一个类有非静态数据成员

3.子类的第一个非静态成员不可以是基类类型

4.没有虚函数

5.没有虚基类

6.所有非静态成员都符合标准布局类型

An important reason why class cannot be called is because he cannot call the constructor

#include <pthread.h>
#include <cstdio>
#include <cstdlib>
#include <assert.h>
#include <stdint.h>

class A{
public:
    int b;
    A(int data)
    {
        a = data;
    }
private:
    int a;
};

__thread class A a = 3;
int main(int argc, char const *argv[])
{
    a.b = 2;
    return 0;
}

Write a demo below to see if the variable modified by __thread has an independent entity in each thread

#include <pthread.h>
#include <cstdio>
#include <cstdlib>
#include <assert.h>
#include <stdint.h>
#include <unistd.h>

__thread uint64_t pkey = 0;

void* run2( void* arg )
{
    pkey = 8;
    printf("run2-ptr:%p\n",&pkey);
    printf("run2:%ld\n",pkey);
    return NULL;
}

void* run1( void* arg )
{
    printf("run1-ptr:%p\n",&pkey);
    printf("run1:%ld\n",pkey);

    return NULL;
}

int main(int argc, char const *argv[])
{
    pthread_t threads[2];
    pthread_create( &threads[1], NULL, run2, NULL );
    sleep(1);
    pthread_create( &threads[0], NULL, run1, NULL );
    pthread_join( threads[0], NULL );
    pthread_join( threads[1], NULL );
    return 0;
}

Here we see that because of the addition of __thread, the second thread originally used the output 8 and the result became 0, which did not change according to the change of the first thread.

4.6 Multithreading and IO

The article says that file io is thread-safe. I’m not sure about this. I have always used pread and pwrite to process file io. I have written demos and experiments before. It is indeed necessary for multiple threads to operate the same socket. Locked, if you don’t lock it will cause problems

[email protected]:LeiZhang-Hunter/sendDemo.git

In fact, this problem itself is of little significance. Both read and write are atomic. Then we read and write the content of a file in multiple threads. If we want to operate, it is easy to cause problems. Static conditions are unavoidable without locking. Timing is also a problem, so I think that each descriptor should be operated by only one thread as much as possible.

4.7 Use RAII to encapsulate the descriptor

Three descriptors that everyone knew when the program first started

0 1 2 Standard input standard output standard error output. The posix standard stipulates that the descriptor must be the smallest number currently when opening a file. In fact, the interface of multithreading to the descriptor is just like writing the fpm interface. Multithreading frequently reads Close the same descriptor will inevitably cause problems. Just like our likes interface frequently likes and cancels likes. If it is not locked, then this interface will be very dangerous, it is easy to be praised by others, and worse. If you cancel the likes frequently, it may become negative. This is really bad. Multi-threaded close and read will cause the serial number of the descriptor. Needless to say, one thread is already reading, and the other thread closes. A lot of dangerous things will happen

In C++, the RAII method is adopted to do this. The socket object is used to wrap the descriptor, and the closure is handled in the destructor. As long as the socket is still alive, there will be no other socket objects with the same descriptor. Of course, the object here does not take the form of new, which is very dangerous. It can be used as a pointer, which is very safe.

I very much agree with the ideas in the book, try to use delete new as little as possible and use smart pointers as much as possible.

4.8 RAII and fork

When writing c c++, we always ensure that the object's structure and destructor always appear in pairs, otherwise there will be memory leaks, but adding us to use fork will be very bad and will break this assumption.

Because after fork, the child process will inherit the address space and space descriptor, so the RAII class used to manage dynamic memory and file descriptors can work normally. But the child process cannot inherit

For example:

1. The memory location of the parent process, mlock, mlockall

2. The file lock of the parent process fcntl(2)

3. Some timers of the parent process settimer alarm timer_create

4. We can use man 2 fork to view directly, there are detailed instructions

具体不会继承的内容 分别是

1）进程id

2）内存锁

3）未决信号集

4）信号量

5）文件锁

6）timer系列函数

7）父进程未完成的io

4.9 Multithreading and fork

The book introduces that fork generally only clones the control thread and other threads will disappear. That is to say, you cannot fork a child process with the same multithreading. Later, I will write a demo to see. After fork, there will only be one thread. Other threads will be small, which can create a very dangerous situation. Other threads happened to be in the critical section, holding a certain lock, and he died suddenly, and there was no chance to unlock it again. If the child process tries to lock the mutex, it will immediately cause a deadlock.

也就是说，如果主线程内有一个公共的锁，被其他线程持有了，这时候你fork之后只会保留控制线程，其他的线程占有了锁，你fork之后其他线程都没了，那
这个锁一旦被使用，那真的是灾难性的了，会造成死锁。

1. malloc, malloc access to global variables will almost always have locks (I am not very clear about this, after all, I have not seen the source code of malloc)

2.new、map::insert、snprintf......

3. Can not use pthread_signal to notify the parent process, only through pipe

4.printf series functions

5. The signal defined in man7 can be any function other than reentrant

So far, I wrote an example to verify the thread changes after fork

#include <pthread.h>
#include <cstdio>
#include <cstdlib>
#include <assert.h>
#include <stdint.h>
#include <unistd.h>

__thread uint64_t pkey = 0;

void* run2( void* arg )
{
    while(1)
    {
        printf("%d\n",getpid());
        sleep(2);
    }
    return NULL;
}

void* run1( void* arg )
{

    while(1)
    {
        printf("%d\n",getpid());
        sleep(2);
    }
    return NULL;
}

int main(int argc, char const *argv[])
{
    printf("parent:%d\n",getpid());
    pthread_t threads[2];
    pthread_create( &threads[1], NULL, run2, NULL );
    sleep(1);
    pthread_create( &threads[0], NULL, run1, NULL );
    pid_t pid = fork();
    if(pid > 0)
    {
        pthread_join( threads[0], NULL );
        pthread_join( threads[1], NULL );
    }else{
        printf("son:%d\n",getpid());
        while (1)
        {
            sleep(2);
        }
    }

    return 0;
}

operation result:

parent:31424
31424
31424
son:31436
31424
31424
31424
31424
31424
31424

Confirmed the statement in the book, only the control thread is running, other threads are not running, note that this is the control thread

4.10 Multithreading and signal

Try not to use signal in multiple threads

Chapter 4 C++ Multithreaded System Programming Essentials