C ++ concurrent programming: C ++ 11 atomic operations and memory model

C ++ concurrent programming: C ++ 11 atomic operations and memory model

First, the need to know

1, factors that affect the final output of the program are as follows

  • Code written order (which is nonsense ...)

    Not elaborated here: Sample C ++ code a are given by:

    #include <thread>
    #include <atomic>
    #include <iostream>
    using namespace std;
    atomic<int> a {0};
    atomic<int> b {0};
    int ValueSet(int) {
    	int t = 1;
    	a = t;
    	b = 2;
    }
    int Observer(int) {
    	cout << "(" << a << ", " << b << ")" << endl; // 可能有多种输出
    }
    int main() {
    	thread t1(ValueSet, 0);
    	thread t2(Observer, 0);
    	t1.join();
    	t2.join();
    	cout << "Got (" << a << ", " << b << ")" << endl; // Got (1, 2)
    }
    

    As the saying goes'll have to, you see the code sequence is not necessarily the result of the last run ...

  • Memory model execution order of the strength of the platform CPU machine instructions

    • Memory model
      Memory models are usually on a hardware concept, represents the machine instructions (or readers it as assembly language instructions can also be) is what order executed by a processor.
    • Strong memory model (such as x86): cpu machine instructions for execution sequentially generated
    • Weak memory model (such as PowerPC): cpu machine-generated instructions may not perform in accordance with the order (out of order)
    • Note: Why is there a weak sequential memory model?
      Briefly weak sequential memory model such that the processor can be further explored instruction parallelism, such higher-performance execution.
    • The following sample code above is "t = 1; a = t; b = 2;" generated pseudo assembly code (machine instructions seen here approximate)
    	1: Loadi reg3, 1; 			# 将立即数1放入寄存器reg3
    	2: Move reg4, reg3; 		# 将reg3的数据放入reg4
    	3: Store reg4, a; 			# 将寄存器reg4中的数据存入内存地址a
    	4: Loadi reg5, 2; 			# 将立即数2放入寄存器reg5
    	5: Store reg5, b; 			# 将寄存器reg5中的数据存入内存地址b
    

    Cpu strong memory model execution order is: always 1-> 2-> 3-> 4-> 5
    sequentially performed cpu weak memory models are: may be 1-> 2-> 3-> 4-> 5, since the instruction 1, 2, 3, and no operation instruction 4,5 influence order (using different registers and different memory address), it may be 1-> 4-> 2-> 5-> 3

  • Compiler compiler optimizations

    • The compiler will code running on the strength of the final memory model compiler generated machine instructions (here approximately as assembler instructions) and instructions for reordering the weak memory models requires a certain execution order of instructions plus memory barrier.
    • Strong memory model generated assembler instruction means
      Since a and b are atomic variable default is to take sequential consistency principle, i.e. to prevent the compiler reordered the instructions associated ab optimization, so the order of execution is always a strong memory model cpu 1-> 2-> 3-> 4 -> 5
    	1: Loadi reg3, 1; 			# 将立即数1放入寄存器reg3
    	2: Move reg4, reg3; 		# 将reg3的数据放入reg4
    	3: Store reg4, a; 			# 将寄存器reg4中的数据存入内存地址a
    	4: Loadi reg5, 2; 			# 将立即数2放入寄存器reg5
    	5: Store reg5, b; 			# 将寄存器reg5中的数据存入内存地址b
    
    • Weak memory model generating assembly instructions
      Since a and b are atomic variable default is to take sequential consistency principle, which prohibits the compiler reordering instructions associated ab optimized assembler code generated so strong and consistent memory model generated, but because the weak memory models are cpu order execution, it also requires added memory compiler fence, the forced instruction execution order
      l-> 2-> 3-> 4->. 5.
      Sync: this command has been forced into the pipeline after the instruction is completed before the processor after the sync command execution (emptying the pipeline). As a result, operating instructions before sync always precedes the instruction after the sync to complete.
    	1: Loadi reg3, 1; 			# 将立即数1放入寄存器reg3
    	2: Move reg4, reg3; 		# 将reg3的数据放入reg4
    	3: Store reg4, a; 			# 将寄存器reg4中的数据存入内存地址a
    	4: Sync 					# 内存栅栏
    	5: Loadi reg5, 2; 			# 将立即数2放入寄存器reg5
    	6: Store reg5, b; 			# 将寄存器reg5中的数据存入内存地址b
    

Two, C ++ 11 memory model

1, the memory model enumerate

Enumeration values Defined rules
memory_order_relaxed It does not make any guarantee execution order
memory_order_acquire This thread all subsequent read operations must be performed after the completion of the operation section atoms
memory_order_release This thread can perform an atomic operation after all of this section before the write operation is complete
memory_order_acq_rel And memory_order_release tag contains memory_order_acquire
memory_order_consume All subsequent thread present about this type of atomic operation must be performed after the completion of the operation section atoms
memory_order_seq_cst All access are executed in the order

2, simple classification memory model

  • Atomic operation is stored (store): may be used memorey_order_relaxed, memory_order_release, memory_order_seq_cst.

  • Atomic read operation (load): may be used memorey_order_relaxed, memory_order_consume, memory_order_acquire, memory_order_seq_cst.

  • Operation RMW (read-modify-write): i.e., that require simultaneous read and write operations, such as mentioned earlier atomic_flag type test_and_set () operation. Another example atomic_compare_exchange atomic class template () operation are required to read and write at the same time. RMW operations can use memorey_order_relaxed, memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel, memory_order_seq_cst.

  • Some of the form "operator =", "operator + =" function, in fact, are simply packaged memory_order_seq_cst memory_order atoms operating as parameters.

3, C ++ 11 memory model commonly used

  • In the same order type
    memory_order_seq_cst for atomic memory sequence type of data is too high, hindering easy system to play its due performance thread
	#include <thread>
	#include <atomic>
	#include <iostream>
	using namespace std;
	atomic<int> a;
	atomic<int> b;
	int Thread1(int) {
		int t = 1;
		a.store(t, memory_order_seq_cst);
		b.store(2, memory_order_seq_cst);
	}
	int Thread2(int) {
		while(b.load(memory_order_seq_cst) != 2); // 自旋等待
		cout << a.load(memory_order_seq_cst) << endl;
	}
	int main() {
		thread t1(Thread1, 0);
		thread t2(Thread2, 0);
		t1.join();
		t2.join();
		return 0;
	}
  • Loose
    memorey_order_relaxed order no memory requirements, the program can not guarantee the results of the operation
	#include <thread>
	#include <atomic>
	#include <iostream>
	using namespace std;
	atomic<int> a;
	atomic<int> b;
	int Thread1(int) {
		int t = 1;
		a.store(t, memory_order_relaxed);
		b.store(2, memory_order_relaxed);
	}
	int Thread2(int) {
		while(b.load(memory_order_relaxed) != 2); // 自旋等待
		cout << a.load(memory_order_relaxed) << endl;
	}
	int main() {
		thread t1(Thread1, 0);
		thread t2(Thread2, 0);
		t1.join();
		t2.join();
		return 0;
	}
  • release-acquire型
    a.store occurs before b.store
    b.load prior to the occurrence a.laod
    fully guarantee the correctness of the code is running, i.e. the time when the value of b 2, a value of 1 is also determined for. The print statement is not printed before the value of a spin-wait
	#include <thread>
	#include <atomic>
	#include <iostream>
	using namespace std;
	atomic<int> a;
	atomic<int> b;
	int Thread1(int) {
	int t = 1;
		a.store(t, memory_order_relaxed);
		b.store(2, memory_order_release); // 本原子操作前所有的写原子操作必须完成
	}
	int Thread2(int) {
		while(b.load(memory_order_acquire) != 2); // 本原子操作必须完成才能执行之后所有的读原子操作
		cout << a.load(memory_order_relaxed) << endl; // 1
	}
	int main() {
		thread t1(Thread1, 0);
		thread t2(Thread2, 0);
		t1.join();
		t2.join();
		return 0;
	}
  • release-consume型
    Such memory ordering guarantees ptr.load (memory_order_consume) must occur prior to such operations Solutions * ptr reference
    does not guarantee that occurs before data.load (memory_order_relaxed)
	#include <thread>
	#include <atomic>
	#include <cassert>
	#include <string>
	using namespace std;
	atomic<string*> ptr;
	atomic<int> data;
	void Producer() {
		string* p = new string("Hello");
		data.store(42, memory_order_relaxed);
		ptr.store(p, memory_order_release);
	}
	void Consumer() {
		string* p2;
		while (!(p2 = ptr.load(memory_order_consume)))
			;
		assert(*p2 == "Hello"); // 总是相等
		assert(data.load(memory_order_relaxed) == 42); // 可能断言失败
	}
	int main() {
		thread t1(Producer);
		thread t2(Consumer);
		t1.join();
		t2.join();
	}
  • acquire-release型
    Other such memory_order_acq_rel, are commonly used in the basic synchronization primitives called CAS (compare and swap), which corresponds to the operation of the atomic atomic compare_exchange_strong member functions.
Published 155 original articles · won praise 15 · views 160 000 +

Guess you like

Origin blog.csdn.net/wangdamingll/article/details/104446740