POSIX多线程程序设计（四）线程同步

使用线程编写稍有难度的程序，一般都需要在线程间共享数据或以一致的顺序在线程间执行一组操作。

不变量、临界区

不变量是由程序作出的假设，特别是有关变量组间关系的假设。当使用一个队列来保存你想用的特殊数据时，你要为队列指定一个队列头指针，指向一个队列的最开头的第一个元素，同时这个数据元素也应该包含指向下一个元素的指针。

临界区（串行数据）是指影响共享数据的代码段。一般，临界区总能对应一个数据不变量，反之亦然。当想修改某个不变量时，你需要锁住一个互斥量；当其它线程也需要修改这个不变量，它也要试图锁住同一个互斥量。这时，这个线程就会一直等待，直到前一个互斥量解锁，才能修改。

互斥量

使线程同步的最通用和最常用的方法就是确保对相同（或相关）数据的内存访问“互斥地”进行，即一次只允许一个线程写数据，其它线程必须等待。
互斥量（mutex）：由单词互相（mutual）的首部“mut”和排斥（exclusion）的首部“ex”组合而成。

经验表明：一般使用 互斥量 和 条件变量 结合建立同步模型，简单、灵活并且高效。

当线程A加锁互斥量后，线程B再对同一互斥量加锁，则由于互斥量已经被锁住，所以线程B被阻塞。
当线程A解锁互斥量后，线程B立即解除阻塞，并对互斥量加锁。
如果线程B在对互斥量加锁时不想被阻塞，则可以调用函数 pthread_mutex_trylock试着锁住互斥量，如果互斥量已经被锁，则立即返回 EBUSY。

创建和销毁互斥量

#include <pthread.h>

pthread_mutex_t pthread_mutex_mutex = PTHREAD_MUTEX_INITIALIZER;
int pthread_mutex_init(pthread_mutex_t * mutex, const pthread_mutex_t *attr);
int pthread_mutex_destroy(pthread_mutex_t *mutex);

一般声明互斥量为外部或静态存储类型。
1. 如果有其它文件要使用互斥量，则将其声明为外部类型
2. 如果仅在本文件内使用，则将其声明为静态类型
可以使用宏 PTHREAD_MUTEX_INITIALIZER 来声明具有默认属性的静态互斥量

#include <pthread.h>

typedef struct my_struct_tag{
    pthread_mutex_t mutex; /*互斥量*/
    int             value; /*保护的值*/
} my_struct_t;

my_struct_t data = {PTHREAD_MUTEX_INITIALIZER, 0}; 
int main()
{
    return 0;
}

如果使用malloc或new动态分配一个包含互斥量的数据结构时，应该调用 pthread_mutex_init来动态的初始化互斥量，该函数也可以动态地初始化静态声明的互斥量，但必须保证每个互斥量在使用前初始化，并且只被初始化一次。

必须在创建任何线程之前初始化互斥量，可以调用 pthread_once

#include <pthread.h>
...//其它包含的头文件

typedef struct my_struct_tag{
    pthread_mutex_t mutex; /*互斥量*/
    int             value; /*保护的值*/
} my_struct_t;

int main()
{
    my_struct_t* data;
    int status;
    data = malloc(sizeof(my_struct_t));
    // 不考虑失败

    status = pthread_mutex_init(&data->mutex, NULL);//互斥量初始化
    // 0成功，非0失败

    status = pthread_mutex_destroy(&data->mutex);// 互斥量销毁

    free(data);
    return 0;
}

注意：
- 通过使用pthread_mutex_init 调用动态初始化的互斥量时，应该调用 pthread_mutex_destroy 来释放它。
- 通过使用PTHREAD_MUTEX_INITIALIZER宏静态初始化的互斥量，则不需要释放。

加锁和解锁互斥量

int pthread_mutex_lock(pthread_mutex_t *mutex);    //加锁
int pthread_mutex_trylock(pthread_mutex_t *mutex); //尝试加锁
int pthread_mutex_unlock(pthread_mutex_t *mutex);  //解锁

当调用线程对一个已经加锁的互斥量，再次加锁会导致：
1. 可能返回错误（EDEADLK）
2. 可能陷入“自死锁”，使不幸的线程永远等待下去
而其它线程对一个已经加锁的互斥量，再次加锁只会：阻塞

不能解锁一个已经解锁的互斥量，也不能解锁一个由其它线程锁住的互斥量
如果需要一个 unowned 锁，可以使用信号灯。

一个简单的使用样例1

#include <pthread.h>
#include <time.h>
#include "errors.h"

typedef struct alarm_tag {
    struct alarm_tag *     link;
    int                  second;
    time_t                 time;
    char            message[64];

} alarm_t;

pthread_mutex_t alarm_mutex = PTHREAD_MUTEX_INITIALIZER;

alarm_t* alarm_list = NULL; 

// 线程函数
void * alarm_thread(void* arg)
{
    alarm_t* alarm;
    int sleep_time;
    time_t now;
    int status;

    while (1)
    {
        status = pthread_mutex_lock(&alarm_mutex);
        // 不判断返回值

        alarm = alarm_list;//得到队列头

        if (alarm == NULL)
        {
            // 如果队列为空，阻塞自己一小段时间，解锁互斥量
            // 以便主线程能够添加新的队列任务
            sleep_time = 1;
        }
        else
        {
            alarm_list = alarm->link;// 队列头指下队列的下一个

            // 获取当前时间，如果时间小于现在，则置0
            // 如果时间大于现在，获取其差值
            now = time(NULL);
            if (alarm->time <= now)
                sleep_time = 0;
            else
                sleep_time = alarm->time - now;

#ifdef DEBUG
            printf("[waiting: %d(%d)\"%s\"]\n", alarm->time, sleep_time,
                alarm->message);
#endif
        }

        status = pthread_mutex_unlock(&alarm_mutex);
        // 不判断返回值了

        if (sleep_time > 0)
            sleep(sleep_time);
        else
            sched_yield();// 主动让出执行权

        if (alarm != NULL)
        {
            printf("(%d) %s\n", alarm->second, alarm->message);
            free(alarm);
        }
    }

    return arg;
}

int main()
{
    int status;
    char line[128];
    alarm_t* alarm, **last, *next;
    pthread_t thread;

    status = pthread_create(&thread, NULL, alarm_thread, NULL);
    // 不判断返回值了

    while (1) {
        printf("Alarm> ");

        if (fgets(line, sizeof(line), stdin) == NULL)
            exit(0);

        if (strlen(line) <= 1)
            continue;

        alarm = (alarm_t*)malloc(sizeof(alarm_t));
        // 不考虑分配失败

        if (sscanf(line, "%d %64[^\n]", &alarm->second, alarm->message) < 2)
        {
            printf("Bad command\n");
            free(alarm);
        }
        else
        {
            status = pthread_mutex_lock(&alarm_mutex);
            // 不考虑失败

            alarm->time = time(NULL) + alarm->second;

            // 添加任务到队列尾中
            last = &alarm_list;
            next = *last;
            while (next != NULL)
            {
                if (next->time >= alarm->time)
                {
                    alarm->link = next;
                    *last = alarm;
                    break;
                }

                last = &next->link;
                next = next->link;
            }

            // 列表为空
            if (next == NULL)
            {
                *last = alarm;
                alarm->link = NULL;
            }

#ifdef DEBUG
            printf("[list: ");
            for (next = alarm_list; next != NULL; next = next->link)
                printf("%d(%d)[\"%s\"] ", next->time, 
                    next->time-time(NULL), next->message);
            printf("]\n");
#endif

            // 解锁
            status = pthread_mutex_unlock(&alarm_mutex);
            // 不判断返回值
        }
    }
    return 0;
}

结果

Alarm> 30 3333333333
[list: 1522755769(30)["3333333333"] ]
Alarm> Alarm> [waiting: 1522755769(29)"3333333333"]
20 2222222222222
[list: 1522755766(20)["2222222222222"] ]
Alarm> Alarm> 24 24444444444444444
[list: 1522755766(14)["2222222222222"] 1522755776(24)["24444444444444444"] ]
Alarm> Alarm> (30) 3333333333
[waiting: 1522755766(0)"2222222222222"]
(20) 2222222222222
[waiting: 1522755776(7)"24444444444444444"]
(24) 24444444444444444

Alarm> Alarm> 30 3333333333333
[list: 1522756022(30)["3333333333333"] ]
Alarm> Alarm> 10 [waiting: 1522756022(29)"3333333333333"]
1111111111111111111
[list: 1522756005(10)["1111111111111111111"] ]
Alarm> Alarm> (30) 3333333333333
[waiting: 1522756005(0)"1111111111111111111"]
(10) 1111111111111111111

使用缺陷：
当输入一个较大时间后，再输入一个较小时间，这个线程在睡眠之间无法响应。

一个简单的样例2

#include <pthread.h>
#include "errors.h"

#define SPIN 10000000

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

long counter;
time_t end_time;

// 线程函数
void * counter_thread(void* arg)
{
    int status;
    int spin;

    while (time(NULL) < end_time)
    {
        status = pthread_mutex_lock(&mutex);
        // 不判断返回值

        for (spin = 0; spin < SPIN; spin++)
            counter++;

        status = pthread_mutex_unlock(&mutex);
        // 不判断返回值了

        sleep(1);
    }

    printf("Counter is %#lx\n", counter);//#表示输出时会在数字前加上0x
    return NULL;
}

void* monitor_thread(void* arg)
{
    int status;
    int misses = 0;

    while (time(NULL) < end_time)
    {
        sleep(3);
        status = pthread_mutex_trylock(&mutex);
        if (status != EBUSY)
        {
            printf("Counter is %ld-%d\n", counter/SPIN, status);
            status = pthread_mutex_unlock(&mutex);
        }
        else
            misses++;
    }

    printf("Monitor thread missed %d times.\n", misses);
    return NULL;
}

int main()
{
    int status;
    pthread_t counter_thread_id;
    pthread_t monitor_thread_id;

    end_time = time(NULL) + 60;
    status = pthread_create(&counter_thread_id, NULL, counter_thread, NULL);
    status = pthread_create(&monitor_thread_id, NULL, monitor_thread, NULL);

    status = pthread_join(counter_thread_id, NULL);
    status = pthread_join(monitor_thread_id, NULL);


    return 0;
}

结果：

Counter is 3-0
Counter is 6-0
Counter is 9-0
Counter is 12-0
Counter is 15-0
Counter is 18-0
Counter is 21-0
Counter is 24-0
Counter is 29-0
Counter is 32-0
Counter is 35-0
Counter is 38-0
Counter is 41-0
Counter is 44-0
Counter is 47-0
Counter is 50-0
Counter is 55-0
Counter is 0x21f98280
Counter is 57-0
Monitor thread missed 2 times.

原子性

原子是指不可分割，但是在线程中是指线程不会看到令它困惑的东西。

我们所说的“原子性”只是意味着：多个线程同时在多个处理器上运行时，其它线程不会发现被破坏（中间状态或不一致状态）的不变量。

当要保护两个共享变量时，你有两种基本策略：
1. 为每个变量指派一个 “小” 的互斥量
2. 为两个变量指派一个 “大” 的互斥量
哪种更好呢？
取决于有多少线程使用共享数据和如何使用共享数据。
主要的设计因素：
1. 互斥量不是免费的，需要时间来加锁和解锁。锁住较少互斥量的程序通常运行的更快。所以，互斥量应该尽量少，够用即可，每个互斥量保护的区域应尽量大。
2. 互斥量的本质是串行执行。如果很多线程需要频繁的加锁同一个互斥量，则线程的大部分时间都是在等待，这对性能是有害的。如果互斥量保护的数据（或代码）包含彼此无关的片段，则可以将大的互斥量分解为几个小的互斥量来提高性能。这样，任意时刻需要小互斥量的线程减少，线程的等待时间也会减少。所有，线程应该足够多（到有意义的地步），每个互斥量保护的区域应尽量少。
3. 上述两点看似矛盾，一旦你了解了互斥量的性能后，就能够正确地处理它。

使用多个互斥量

同时使用多个互斥量会导致复杂度的增加。最坏的情况是发生死锁，即两个线程分别锁住一个互斥量而等待对方的互斥量。
死锁

第一个线程	第二个线程
pthread_mutex_lock(&mutex_a)	pthread_mutex_lock(&mutex_b)
pthread_mutex_lock(&mutex_b)	pthread_mutex_lock(&mutex_a)

避免死锁：
1. 固定加解锁顺序：都先加锁 mutex_a，然后再加锁 mutex_b；解锁时，先解锁 mutex_b,再解锁 mutex_a ；
2. 试加锁和回退：在锁住第一个互斥量后，使用 pthread_mutex_trylock，如果失败，则释放所有锁，然后重新加锁

加锁：互斥量1，互斥量2，互斥量3，… ，互斥量n
解锁：互斥量n，.. ，互斥量3，互斥量2，互斥量1

链锁

“链锁”是层次锁的一个特殊实例，即两个锁的范围互相交叠。
当锁住第一个互斥量后，代码进入一个区域，该区域需要另一个互斥量。当锁住另一个互斥量后，第一个互斥量就不再需要了，可以释放了。
这种技巧一般用于遍历如树形结构或链表结构。每一个节点设置一个互斥量，而不是用一个互斥量锁住整个数据结构，阻止任何并行访问。遍历代码可以先锁住队列头或者根节点，找到期望的节点，锁住它，然后释放根节点或队列头互斥量。

条件变量

条件变量是用来通知共享数据状态信息的。可以使用条件变量来通知队列已空、或队列非空、或任何其他需要由线程处理的共享数据状态。
- 等待条件变量总是返回锁住的互斥量。
- 条件变量的作用是发信号，而不是互斥。

为什么不将互斥量作为条件变量的一部分来创建？
首先，互斥量不仅与条件变量一起使用，而且还可以单独使用；
其次，通常一个互斥量可以与多个条件变量相关联。

创建和销毁条件变量

静态创建：pthread_cond_t mycon=PTHREAD_COND_INITIALIZER;
- 静态创建的条件变量和互斥量不需要销毁函数
动态创建：int pthread_cond_init(pthread_cond_t *cond, pthread_condattr_t *cond_attr);
- 动态创建的条件变量和互斥量需要调用销毁函数
销毁条件变量：int pthread_cond_destroy(pthread_cond_t *cond);
发信号：int pthread_cond_signal(pthread_cond_t *cond);
- pthread_cond_signal 使在条件变量上等待的线程中的一个线程重新开始。如果没有等待的线程，则什么也不做。如果有多个线程在等待该条件，只有一个能重启动，但不能指定哪一个。
发广播：int pthread_cond_broadcast(pthread_cond_t *cond);
- pthread_cond_broadcast 重新启动等待该条件变量的所有线程。如果没有等待的线程，则什么也不做。
阻塞等待：int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);
- pthread_cond_wait 自动解锁互斥量(如同执行了pthread_unlock_mutex)，并等待条件变量触发。这时线程挂起，不占用 CPU 时间，直到条件变量被触发。在调用 pthread_cond_wait之前，应用程序必须加锁互斥量。pthread_cond_wait 函数返回前，自动重新对互斥量加锁(如同执行了pthread_lock_mutex)。互斥量的解锁和在条件变量上挂起都是自动进行的。因此，在条件变量被触发前，如果所有的线程都要对互斥量加锁，这种机制可保证在线程加锁互斥量和进入等待条件变量期间，条件变量不被触发。
阻塞超时等待：int pthread_cond_timedwait(pthread_cond_t *cond, pthread_mutex_t *mutex, const struct timespec *abstime);
- pthread_cond_timedwait 和 pthread_cond_wait 一样，自动解锁互斥量及等待条件变量，但它还限定了等待时间。如果在 abstime 指定的时间内 cond 未触发，互斥量 mutex 被重新加锁，且 pthread_cond_timedwait 返回错误ETIMEDOUT。abstime 参数指定一个绝对时间，时间原点与 time 和 gettimeofday 相同：abstime = 0 表示 1970 年 1 月 1 日 00:00:00 GMT。