How to deal with common concurrent programming defects?

Over the years, researchers have spent a lot of time and energy to study defects in concurrent programming. Many of the early work is about the deadlock, before the chapter also mentioned, this chapter will be in-depth study [C + 71]. Recent research has focused on some other type of common concurrency defects (ie, non-deadlock defect). In this chapter, we will briefly understand some examples of concurrency issues in order to better understand what to pay attention to the problem. Therefore, the key question in this chapter is:

The key question: how to deal with common concurrency defects

Concurrency defects have a lot of common patterns. Understanding these patterns is to write robust, the first step in the right program.

What types of defects have 32.1

The first and most obvious question is: in a complex concurrent programs, what types of defects have it? In general, this is a difficult question, but fortunately others have done related work. Specifically, Lu et al. [L + 08] detailed analysis of a number of popular applications concurrently, in order to understand what types of defects in practice there.

Study focused on four important open source applications: MySQL (popular database management system), Apache (Web server known), Mozilla (the famous Web browser) and OpenOffice (an open source version of the Microsoft Office suite). Concurrency defect analysis by examining several code libraries that have been fixed, the developer's work becomes quantifiable defect researchers. Understanding these results will help us understand in a mature code base, which appeared in the actual type of concurrency problems.

Table 32.1 is Lu and his colleagues study concludes. As can be seen, a total of 105 defects, mostly non deadlock associated (74), the remaining defect 31 is deadlock. Further, it can be seen the number of defects per application, OpenOffice only eight, while nearly 60 Mozilla.

Table 32.1 modern applications of statistical flaws

Application Name use Non-deadlock Deadlock
MySQL Database Service 14 9
Apache Web server 13 4
Mozilla Web Browser 41 16
OpenOffice Office suite 6 2
total 74 31

We now depth analysis of these two types of defects. For the first type non-defective deadlock, by way of example we will discuss the study. For the second type deadlock defect, we discuss people prevent, avoid and deal with a lot of work done on the deadlock.

32.2 Non-deadlock defect

Lu's research shows that non-deadlock accounted for the majority of concurrency issues. Which is how it happened? How do we fix? We are now focused on two of: violation of atomicity (atomicity violation) defects and in the wrong order (order violation) defects.

Breach of atomic defects

The first type of question is called a violation of atomicity. This is an example of a MySQL appear. Readers can first find out where the problem lies on their own.

1    Thread 1::
2    if (thd->proc_info) {
3      ...
4      fputs(thd->proc_info, ...);
5      ...
6    }
7
8    Thread 2::
9    thd->proc_info = NULL;

In this example, two threads must access the members proc_info thd structure. The first thread checks proc_info non-empty, then print the value; a second thread is provided as a blank. Obviously, when a thread after the first inspection, the fputs () call was interrupted before the second thread pointer is set to null; when the first thread resumes execution, because the null pointer reference causes the program to Ben collapse.

According to Lu et al., More formal definition of breach of atomicity is: "in violation of a number of memory accesses can be expected of a serial (ie, code segments intended atoms, but in execution and did not enforce atomicity)" . In our example, the non-empty proc_info inspection and fputs () calls the print proc_info hypothetical atoms, when the assumption does not hold, the code on the problem.

Fix this problem is usually (but not always) very simple. Can you think of how to fix it?

In this scenario, we just locking access to shared variables, to ensure that each thread access proc_info field, are in possession of a lock (proc_info_lock). Of course, all other code to access this structure, you should first acquire a lock.

1    pthread_mutex_t proc_info_lock = PTHREAD_MUTEX_INITIALIZER;
2
3    Thread 1::
4    pthread_mutex_lock(&proc_info_lock);
5    if (thd->proc_info) {
6      ...
7      fputs(thd->proc_info, ...);
8      ...
9    }
10    pthread_mutex_unlock(&proc_info_lock);
11
12   Thread 2::
13   pthread_mutex_lock(&proc_info_lock);
14   thd->proc_info = NULL;
15   pthread_mutex_unlock(&proc_info_lock);

Defect in violation of the order

Another common non deadlock Lu et al is called in violation of the order (order violation). The following is a simple example. Also, see if you can figure out why the code below defective.

1    Thread 1::
2    void init() {
3        ...
4        mThread = PR_CreateThread(mMain, ...);
5        ...
6    }
7
8    Thread 2::
9    void mMain(...) {
10       ...
11       mState = mThread->State;
12       ...
13   }

You may have noticed that the code thread 2 seems to assume that the variables mThread has been initialized (not empty). If, however, without first performing a thread 1 and thread 2 may reference a null pointer because Ben collapse (assuming that mThread initial value is empty; otherwise, could produce more strange question, because the thread 2 will read arbitrary memory locations and references).

Violation of the order of more formal definition is: "Expected two memory accesses of the order is broken (ie A should be executed before B, but the actual operation is not in that order)" [L + 08].

Let's fix this defect by enforcing order. As previously discussed in detail, the condition variable (condition variables) is a simple and reliable manner, the addition of such modern synchronization code set. In the above example, we can modify the code to this:

1    pthread_mutex_t mtLock = PTHREAD_MUTEX_INITIALIZER;
2    pthread_cond_t mtCond = PTHREAD_COND_INITIALIZER;
3    int mtInit            = 0;
4
5    Thread 1::
6    void init() {
7       ...
8       mThread = PR_CreateThread(mMain, ...);
9
10      // signal that the thread has been created...
11      pthread_mutex_lock(&mtLock);
12      mtInit = 1;
13      pthread_cond_signal(&mtCond);
14      pthread_mutex_unlock(&mtLock);
15      ...
16   }
17
18   Thread 2::
19   void mMain(...) {
20      ...
21      // wait for the thread to be initialized...
22      pthread_mutex_lock(&mtLock);
23      while (mtInit == 0)
24          pthread_cond_wait(&mtCond,  &mtLock);
25      pthread_mutex_unlock(&mtLock);
26
27      mState = mThread->State;
28      ...
29   }

In the code of this repair, we added a lock (mtLock), a condition variable (mtCond) as well as the status of variables (mtInit). Initialization code is run, will mtInit is set to 1, and send a signal that it has done it. If the thread run before 2, would have been waiting for the signal and the corresponding state change; if after running, the thread 2 checks whether initialized (i.e. mtInit is set to 1), then normal operation. Please note that we can use mThread itself as a state variable, but for brevity, we did not do so. When the order is very important between threads, condition variables (or semaphore) can solve the problem.

Non-deadlock defect: Summary

Lu et al.'S study, the majority (97%) of the non-deadlock is a violation of atomicity and in violation of the order of two. Therefore, careful study of these programmers error mode, should be better able to avoid them. In addition, with the development of more automated code review tools, they should be concerned about these two errors, because most of the non-deadlock development are found in both.

However, not all defects are the same as our example, so easy to fix. Some issues require a deeper understanding of the application, as well as significant restructuring of code and data structures. Read the excellent (readable) Lu et al.'S paper for more details.

32.3 deadlock defect

In addition to the above-mentioned defects concurrent deadlock (deadlock) is a classic problem of one kind appeared in a number of complex concurrent systems. For example, when a thread holds the lock L1 1, we are waiting for another lock L2, and the thread holds the lock 2 L2, L1 but while waiting for a lock release, the deadlock arises. The following code fragment that deadlock can occur:

Thread 1:    Thread 2:
lock(L1);    lock(L2);
lock(L2);    lock(L1);

This code runs, there will not necessarily deadlock. 1 when the thread occupies the lock L1, context switches to thread 2. Thread 2 locks L2, trying to lock L1. But then it produced a deadlock, two threads wait for each other. FIG 32.1, in which the ring (Cycle) indicate deadlock.

.. \ 18-1225 FIG \ 32-2.tif {35%}

FIG dependency FIG 32.1 Deadlock

This figure should help clear description of the problem. Programmers write code should be how to deal with the deadlock it?

The key question: how to deal with deadlock

When we implement the system, how to avoid or be able to detect, recover deadlock it? This is a real problem in the current system do?

Why deadlock

You might be thinking, this deadlock example mentioned above, it is easy to avoid. For example, as long as the thread 1 and the thread 2 are used to grab the same order of lock, a deadlock can not occur. So, why would a deadlock occur?

One reason is that there is a large complex dependencies between code library, assembly. To the operating system as an example. The virtual memory system needs to access the file system can be read from disk memory pages; then file system and virtual memory but also to interact, to apply for a memory to store read blocks. Therefore, in the design of large systems locking mechanism, you have to be careful to avoid deadlock caused by circular dependencies.

Another reason is encapsulated (encapsulation). Software developers have tended to hide implementation details in a modular way to make software development easier. However, modularity and locks are not very fit. Jula, who pointed out that [J + 08], some seem not related interfaces may lead to a deadlock. The Java Vector class and AddAll () method, for example, so we call this method:

Vector v1, v2; 
v1.AddAll(v2);

Internally, this method requires a multi-thread safe, and therefore need to be added for obtaining the vector (v1) and parameters (v2) of the lock. Assuming that this method give v1 lock, the lock then give v2. If some other thread almost simultaneously calling v2.AddAll (v1), you may encounter a deadlock.

Produced deadlock condition

Deadlock requires the following four conditions [C + 71].

  • Mutex: thread mutually exclusive access (for example, grab a thread lock) for the resources needed.
  • Hold and wait: thread holds a resource (for example, has held a lock), while waiting for other resources (for example, need to obtain the lock).
  • Non-preemptive: resources (such as locks) to get the thread can not be preempted.
  • Loop wait: there is a loop between threads, each held on a loop are additional resources, and this resource is to apply for next thread.

If any one of these four conditions are not met, the deadlock does not occur. Therefore, we first look at way to prevent deadlock; each policy to try to prevent certain conditions, to solve the problem of deadlock.

prevention

Circular wait

Perhaps the most practical prevention techniques (of course, is often used), is to make the code will not have to wait for the cycle. The most direct way is to provide a total order (total ordering) acquiring a lock. If the system has two locks (L1 and L2), then we have to apply for each application and L1 L2, can avoid deadlock. Such strict order to avoid the wait cycle, there will be no deadlock.

Of course, more complex systems are not only two locks, lock the whole sequence may be difficult to achieve. Thus, partial sequence (partial ordering) could be a useful method to acquire and lock arrangements to avoid deadlock. Mapping code memory in Linux is a good example of the partial order lock [T + 94]. Note code indicates the beginning of the 10 different sets of locking sequence, comprising a simple relationship, such i_mutex i_mmap_mutex earlier, also involves a complex relationship, such as early i_mmap_mutex private_lock, earlier than swap_lock, prior to mapping-> tree_lock.

You can think of the whole order and partial order requires careful design and implement a locking strategy. In addition, the order of just a convention, careless programmers can easily be ignored, resulting in a deadlock. Finally, orderly lock requires in-depth understanding of code base to understand the relationship between the various functions of the call, even if a mistake would lead to "D" word [1]

Tip: order to force the lock by lock address

When a lock function to grab more, we need to pay attention to deadlock. For example, there is a function: do_something (mutex t * m1, mutex t * m2), if the function is always the first to grab m1, then m2, then when a thread calls do_something (L1, L2), and the other thread calls do_something (L2, when L1), it may produce a deadlock.

To avoid this particular problem, according to the lock clever programmers sequentially acquired address as a lock. Descending in accordance with the address, or lock in the order from low to high, do_something () function can be guaranteed regardless of which order parameter is passed, the function will be locked with a fixed order. Specific code as follows:

if (m1 > m2) { // grab locks in high-to-low address order 
  pthread_mutex_lock(m1);
  pthread_mutex_lock(m2);
} else { 
  pthread_mutex_lock(m2); 
  pthread_mutex_lock(m1);
 
}
// Code assumes that m1 != m2 (it is not the same lock)

When acquiring multiple locks through simple techniques, you can ensure that no deadlock simple and effective implementation.

Hold and wait

Deadlock hold and wait condition can be avoided by atomic grab the lock. In practice, you may be implemented by the following code:

1    lock(prevention);
2    lock(L1);
3    lock(L2);
4    ...
5    unlock(prevention);

After the lock to grab prevention, to ensure that the code is in the process of looting in the lock, there will be no untimely switching threads, thus avoiding the deadlock. Of course, this requires any thread at any time preemption lock, first grabbed global prevention lock. For example, if another thread in a different order to grab and lock L1 L2, there is no problem, because the thread has grabbed the lock prevention.

Note that, for some reason, this program is also a problem. As before, it does not apply to packages: Because this program we need to know exactly what to grab the lock, and advance to grab these locks. To grab in advance because all locks (simultaneously), but not when really needed, it may reduce concurrency.

Non-preemptive

Before calling unlock, lock is believed to be occupied, more than grab the lock operation often leads to trouble, because we are waiting for a lock, while holding another lock. Many threads library provides a more flexible interface to avoid this situation. Specifically, trylock () function attempts to get the lock, or return -1, indicating that the lock has been occupied. You can re-try later.

This interface can be used to achieve the locking method deadlock:

1    top:
2      lock(L1);
3      if (trylock(L2) == -1) {
4        unlock(L1);
5        goto top;
6    }

Note that, another thread may use the same locking manner, but with different locking order (L2 and Ll), the program still will not deadlock. But it will lead to a new problem: livelocks (livelock). There are two threads may have to repeat this sequence, and at the same time have failed to grab the lock. In this case, the system has been running this code (and therefore not a deadlock), but do not have progress, hence the name livelock. There livelock solution: for example, at the end of the cycle, a first random wait time, then repeat the operation, which can reduce mutual interference between the threads repeated.

On this last point program: Use trylock method might be some difficulties. The first package is still a problem: If a certain lock, is packaged inside a function, then jump back to the beginning of this difficult to achieve. If the code gets in the middle of some resources, we must ensure that these resources can be released. For example, after grab L1, our code allocated some memory, when the rush L2 fail, and before returning to the beginning, you need to release the memory. Of course, in some scenarios (e.g., the previously mentioned Java vector method), this method is very effective.

Exclusive

Finally, prevention is to completely avoid mutually exclusive. Generally speaking, there will be a critical section of code, it is difficult to avoid mutually exclusive. So how should we do it?

Herlihy proposed the idea of ​​designing a variety of no-wait (wait-free) data structure [H91]. The idea is simple: through a powerful hardware instructions, we can construct a data structure does not require a lock.

As a simple example, suppose we have a compare and swap (compare-and-swap) instruction, it is a method for providing the atomic instruction by hardware to do the following things:

1    int CompareAndSwap(int *address, int expected, int new) {
2      if (*address == expected) {
3        *address = new;
4        return 1; // success
5      }
6      return 0;   // failure
7    }

Suppose we want to atomically to a specific value to increase the number. We can achieve this:

1    void AtomicIncrement(int *value, int amount) {
2      do {
3        int old = *value;
4      } while (CompareAndSwap(value, old, old + amount) == 0);
5    }

Without acquiring the lock, update the value, and then release the lock of these operations, we use the compare and swap instruction, repeated attempts to value updated to the new value. This lock mode is not used, so there is no deadlock (likely to produce life lock).

Let's consider a more complex example: insert list. This is the code element is inserted in the head of the list:

1    void insert(int value) {
2      node_t *n = malloc(sizeof(node_t));
3      assert(n != NULL);
4      n->value = value;
5      n->next = head;
6      head     = n;
7    }

When this code in multiple threads simultaneously call, there will be a critical area (see if you can figure out the cause). Of course, we can solve this problem by giving the relevant code lock:

1    void insert(int value) {
2      node_t *n = malloc(sizeof(node_t));
3      assert(n != NULL);
4      n->value = value;
5      lock(listlock);    // begin critical section
6      n->next = head;
7      head    = n;
8      unlock(listlock); // end of critical section
9    }

The above scenario, we use the traditional lock [2]
. Here we try to achieve the insertion operation with the compare and swap instruction (compare-and-swap). One possible implementation is:

1    void insert(int value) {
2      node_t *n = malloc(sizeof(node_t));
3      assert(n != NULL);
4      n->value = value;
5      do {
6        n->next = head;
7      } while (CompareAndSwap(&head, n->next, n) == 0);
8    }

Code, the first next pointer pointing to the current list head (head), and then try to exchange a new node to the beginning of the list. However, if this time the other thread successfully modified the value of the head, where the exchange will fail, resulting in this thread retry value according to the new head.

Of course, only the insertion operation is not enough to achieve a complete list also need to remove, to find other work. If you are interested, you can go on without waiting for synchronization access to the rich literature.

By scheduling to avoid deadlock

In addition to deadlock prevention, some of the scenes is more suitable for deadlock avoidance (avoidance). We need to understand the global information, including the demand for different threads in the operation of the lock, so that subsequent scheduling can avoid deadlock.

For example, suppose that we need to schedule four threads on two processors. Furthermore, we assume that we know that the thread 1 (T1), and require locks L1 L2, T2 need to grab and L1 L2, T3 only L2, T4 need not lock. We use the table to represent the needs of 32.2 thread lock.

Table 32.2 thread needs to lock

T1 T2 T3 T4
L1 yes yes no no
L2 yes yes yes no

Scheduling a more intelligent way, as long as T1 and T2 are not operating simultaneously, deadlock does not occur. Here are this way:

.. \ 18-1225 FIG \ 32-2a.tif {40%}

请注意,T3和T1重叠,或者和T2重叠都是可以的。虽然T3会抢占锁L2,但是由于它只用到一把锁,和其他线程并发执行都不会产生死锁。

我们再来看另一个竞争更多的例子。在这个例子中,对同样的资源(又是锁L1和L2)有更多的竞争。锁和线程的竞争如表32.3所示。

表32.3 锁和线程的竞争

T1 T2 T3 T4
L1 yes yes yes no
L2 yes yes yes no

特别是,线程T1、T2和T3执行过程中,都需要持有锁L1和L2。下面是一种不会产生死锁的可行方案:

.. \ 18-1225 FIG \ 32-2b.tif {45%}

你可以看到,T1、T2和T3运行在同一个处理器上,这种保守的静态方案会明显增加完成任务的总时间。尽管有可能并发运行这些任务,但为了避免死锁,我们没有这样做,付出了性能的代价。

Dijkstra提出的银行家算法[D64]是一种类似的著名解决方案,文献中也描述了其他类似的方案。遗憾的是,这些方案的适用场景很局限。例如,在嵌入式系统中,你知道所有任务以及它们需要的锁。另外,和上文的第二个例子一样,这种方法会限制并发。因此,通过调度来避免死锁不是广泛使用的通用方案。

检查和恢复

最后一种常用的策略就是允许死锁偶尔发生,检查到死锁时再采取行动。举个例子,如果一个操作系统一年死机一次,你会重启系统,然后愉快地(或者生气地)继续工作。如果死锁很少见,这种不是办法的办法也是很实用的。

提示:不要总是完美(TOM WEST定律)

Tom West是经典的计算机行业小说《Soul of a New Machine》[K81]的主人公,有一句很棒的工程格言:“不是所有值得做的事情都值得做好”。如果坏事很少发生,并且造成的影响很小,那么我们不应该去花费大量的精力去预防它。当然,如果你在制造航天飞机,事故会导致航天飞机爆炸,那么你应该忽略这个建议。

很多数据库系统使用了死锁检测和恢复技术。死锁检测器会定期运行,通过构建资源图来检查循环。当循环(死锁)发生时,系统需要重启。如果还需要更复杂的数据结构相关的修复,那么需要人工参与。

读者可以在其他地方找到更多的关于数据库并发、死锁和相关问题的资料[B+87,K87]。阅读这些著作,当然最好可以通过学习数据库的课程,深入地了解这一有趣而且丰富的主题。

32.4 小结

在本章中,我们学习了并发编程中出现的缺陷的类型。第一种是非常常见的,非死锁缺陷,通常也很容易修复。这种问题包括:违法原子性,即应该一起执行的指令序列没有一起执行;违反顺序,即两个线程所需的顺序没有强制保证。

同时,我们简要地讨论了死锁:为何会发生,以及如何处理。这个问题几乎和并发一样古老,已经有成百上千的相关论文了。实践中是自行设计抢锁的顺序,从而避免死锁发生。无等待的方案也很有希望,在一些通用库和系统中,包括Linux,都已经有了一些无等待的实现。然而,这种方案不够通用,并且设计一个新的无等待的数据结构极其复杂,以至于不够实用。也许,最好的解决方案是开发一种新的并发编程模型:在类似MapReduce(来自Google)[GD02]这样的系统中,程序员可以完成一些类型的并行计算,无须任何锁。锁必然带来各种困难,也许我们应该尽可能地避免使用锁,除非确信必须使用。

本文摘自刚刚上架的《操作系统导论》(Operating Systems)

作者:[美] 雷姆兹·H.阿帕希杜塞尔( Remzi H. Arpaci-Dusseau), [美]安德莉亚·C.阿帕希杜塞尔(Andrea C. Arpaci-Dusseau)

译者:王海鹏

  • 美国知名操作系统教材
  • 紧紧围绕操作系统的三大主题元素:虚拟化 并发和持久性进行讲解
  • 豆瓣原版评分9.7

The book around virtualization, the three main concepts of concurrent and persistent launched, features all the major components of modern systems (including scheduling, virtual memory management, disk and I / O subsystems, file systems). The book has 50 chapters, divided into three parts, namely about virtualization, concurrency, and persistence of relevant content. On the introduction of the concept of dialogue in the form of theme introduced, but incisive language of humor, and strive to help the reader understand the operating system virtualization, concurrency, and persistence principle.
The book is comprehensive and gives the code (not the pseudo-code) to run the real, also provides a corresponding practice, it is suitable for teachers to carry out the relevant professional institutions of higher learning and teaching college students self-study.

This book has the following features:
● prominent theme, focus on three main themes of the operating system - virtualization, concurrency, and persistence.
● the dialogue of the introduction of the background, ask questions, and then explain the principles, inspired by hands-on practice.
● contains a number of "complementary" and "tips" to expand the readers knowledge, increase the fun.
● using real code instead of pseudo-code, allowing readers more in-depth and thorough understanding of operating system.
● provide many job learning, simulation and projects to encourage readers to hands-on practice.
● provide resources for teachers teaching aids.

This book provides resources for teachers teaching aids as follows:

  • PPT teaching and lecture notes.
  • Exam questions and suggested answers.
  • Discussion questions and assignments.
  • Project description and guidance.
  • If you are a teacher, want teaching support resources, please send an email to [email protected] application.

Guess you like

Origin blog.csdn.net/epubit17/article/details/92377275