OpenMP basic concept [turn]

OpenMP is a multi-threaded programming scheme for shared-memory parallel system, supported programming languages ​​including C, C ++ and Fortran. OpenMP provides a level of abstraction of the parallel algorithm is described, particularly suitable for parallel programs on a multi-core CPU design of the machine. The instructions of the program compiler pragma added, automatic parallel processing program, the parallel programming using OpenMP reduce the difficulty and complexity. When the compiler does not support OpenMP, the program will degenerate into a normal (serial) program. OpenMP directives already in the program will not affect the normal operation of the compiler.

 

Enable OpenMP is very simple in VS, many mainstream compiler environment built OpenMP. Right on the project -> Properties -> Configuration Properties -> C / C ++ -> Language -> OpenMP support, select the "Yes" button.

 

OpenMP execution mode

 

OpenMP using the fork-join execution mode. At first there is only one main thread, when the need for parallel computing, to derive a number of branches of thread to perform parallel tasks. When the parallel code execution is completed, the thread branches join, and the control flow to the separate main thread.

A typical schematic fork-join execution model as follows:

 

 

OpenMP programming model to thread basis, guided by the compiler to parallelize command guidance, there are three elements can achieve parallel programming control, they are compiled guidance, API function set and environment variables.

 

Compile guidance

 

#Pragma omp compiler guidance commands to start back with specific functional instructions, formats such as: #pragma omp directive [clause [, clause] ...] . Common features instructions are as follows:

  • Parallel : Before using a block structure, this code is represented by a plurality of parallel execution threads; 
    for: prior to use in the for loop statement, shows a cycle calculation tasks allocated to the plurality of parallel execution threads, to achieve the task sharing , by the programmer must ensure that their data without correlation between each cycle; 
    parallel for : instructions for binding and parallel, are used for loop before, represents the code for the loop body is executed in parallel a plurality of threads it also has the task generating and sharing two parallel time-domain functions; 
    sections : with prior code segment may be executed in parallel, a plurality of structural blocks to achieve the task sharing statement code segment can be executed in parallel with each instruction section indicated (note the distinction between sections and section); 
    parallel sections : sections and two parallel binding statements, for similar parallel;
    sINGLE : in parallel with the art, a code indicating a period of only a single thread is executed;
    Critical : before using a critical section of code area, to ensure that only one thread OpenMP enter;
    flush : Ensure the consistency of data in each image OpenMP threads;
    barrier : for threads in parallel within the code synchronization, thread execution to the barrier to stop and wait until all threads are executed when the barrier to continue down;
    Atomic : for specifies a data operation needs to be done atomically;
    master : specifies the section of code is executed by the main thread;
    a threadprivate : specifies one or more variables are special threads, as explained later, and thread-specific private distinction.

 

Corresponding OpenMP clause: 


  • Private : Specifies one or more variable has its own private copy in each thread;
    firstprivate : designate one or more variable has its own private copy of each thread, and private variables to be entered in the field or in parallel task sharing domain, the value of the same variables are inherited in the main thread as the initial value;
    lastprivate : is used to specify the value of a variable of the same name or a plurality of thread private variables are copied to the main thread after parallel processing , is responsible for copying the thread sections task sharing or the last thread; 
    reduction : specify one or more variables to be private, and these variables specified to execute a reduction operation after the parallel processing, the results returned to the main thread of the same name variable;
    nowait : pointed concurrent threads can ignore other guidance commands implicit barrier synchronization;
    NUM_THREADS : the number of threads parallelly in the art; 
    schedule : task assignment schedule specified for task sharing the type;
    Shared : specify one or multiple variables shared variables between multiple threads;
    ordered : for For a given task sharing need within the specified code segments executed in serial cyclic order;
    COPYPRIVATE : single instruction with the specified thread-specific variables the same variables are broadcast to other threads in parallel in the art;
    the copyin : specifies the type of a variable threadprivate We need to be initialized with the main thread of the same name variables; 
    default: Used to specify the use of the parallel variable domain, the default is shared.

 

API functions

 

In addition to the above-described compiler guidance commands, the OpenMP also provides a set of API functions for controlling certain behavior of concurrent threads, here are some common functions and OpenMP API Description: 

 

Environment Variables

 

OpenMP define some environment variables, behavior OpenMP program can be controlled by these environment variables commonly used environment variables:

  • The OMP_SCHEDULE : for scheduling for the parallelization of the loop, the value is round-robin scheduling type;  
    the OMP_NUM_THREADS : number of threads in parallel for the domain;   
    the OMP_DYNAMIC : by setting variable values to determine whether to allow the dynamic set parallel art the number of threads;  
    the OMP_NESTED : Indicates whether the parallel nested. 

 

A simple example of the use of parallel

 

parallel guidance commands to create a parallel domain, the code to keep a back brace to be executed in parallel together:

 

 
  1. #include<iostream>

  2. #include"omp.h"

  3.  
  4. using namespace std;

  5.  
  6. void main()

  7. {

  8. #pragma omp parallel

  9. {

  10. cout << "Test" << endl;

  11. }

  12. system("pause");

  13. }

 

 

Performing the above procedure the following output:

 

The program prints out 4 "Test", explained the statement after the four parallel threads are executed once, four procedural default number of threads, the number of threads can also be created by the clause num_threads explicit control:

 

 
  1. #include<iostream>

  2. #include"omp.h"

  3.  
  4. using namespace std;

  5.  
  6. void main()

  7. {

  8. #pragma omp parallel num_threads(6)

  9. {

  10. cout << "Test" << endl;

  11. }

  12. system("pause");

  13. }

 

 

Compile and run the following output:

 

The program explicitly defines six thread, the parallel statement blocks are executed six times. The second line is blank lines because each thread is run independently, in which a thread after output character "Test" has not had time to change lines, another thread directly output the character "Test".

 

A simple example of parallel for use

 

Use parallel guidance commands just had a parallel domain, so that multiple threads are performing the same task, and no actual use value. for generating parallel for a parallel domain, and calculates the distribution of tasks among multiple threads, so that running speed is calculated. Can let the system assign the default number of threads can also be used to specify the number of threads num_threads clause.

 

 
  1. #include<iostream>

  2. #include"omp.h"

  3.  
  4. using namespace std;

  5.  
  6. void main()

  7. {

  8. #pragma omp parallel for num_threads(6)

  9. for (int i = 0; i < 12; i++)

  10. {

  11. printf("OpenMP Test, 线程编号为: %d\n", omp_get_thread_num());

  12. }

  13. system("pause");

  14. }

 

 

Run output:

 

Program specifies the upper thread 6, an amount of 12 iterations, the output may be seen that each thread assigned 12/6 = 2 times the amount of iterations.

 

OpenMP efficiency improvement and efficiency compare the number of different threads

 

 

 
  1. #include<iostream>

  2. #include"omp.h"

  3.  
  4. using namespace std;

  5.  
  6. void test()

  7. {

  8. for (int i = 0; i < 80000; i++)

  9. {

  10. }

  11. }

  12.  
  13. void main()

  14. {

  15. float startTime = omp_get_wtime();

  16.  
  17. //指定2个线程

  18. #pragma omp parallel for num_threads(2)

  19. for (int i = 0; i < 80000; i++)

  20. {

  21. test();

  22. }

  23. float endTime = omp_get_wtime();

  24. printf("指定 2 个线程,执行时间: %f\n", endTime - startTime);

  25. startTime = endTime;

  26.  
  27. //指定4个线程

  28. #pragma omp parallel for num_threads(4)

  29. for (int i = 0; i < 80000; i++)

  30. {

  31. test();

  32. }

  33. endTime = omp_get_wtime();

  34. printf("指定 4 个线程,执行时间: %f\n", endTime - startTime);

  35. startTime = endTime;

  36.  
  37. //指定8个线程

  38. #pragma omp parallel for num_threads(8)

  39. for (int i = 0; i < 80000; i++)

  40. {

  41. test();

  42. }

  43. endTime = omp_get_wtime();

  44. printf("指定 8 个线程,执行时间: %f\n", endTime - startTime);

  45. startTime = endTime;

  46.  
  47. //指定12个线程

  48. #pragma omp parallel for num_threads(12)

  49. for (int i = 0; i < 80000; i++)

  50. {

  51. test();

  52. }

  53. endTime = omp_get_wtime();

  54. printf("指定 12 个线程,执行时间: %f\n", endTime - startTime);

  55. startTime = endTime;

  56.  
  57. //不使用OpenMP

  58. for (int i = 0; i < 80000; i++)

  59. {

  60. test();

  61. }

  62. endTime = omp_get_wtime();

  63. printf("不使用OpenMP多线程,执行时间: %f\n", endTime - startTime);

  64. startTime = endTime;

  65.  
  66. system("pause");

  67. }

 

 

以上程序分别指定了2、4、8、12个线程和不使用OpenMP优化来执行一段垃圾程序,输出如下:

 

可见,使用OpenMP优化后的程序执行时间是原来的1/4左右,并且并不是线程数使用越多效率越高,一般线程数达到4~8个的时候,不能简单通过提高线程数来进一步提高效率。

Guess you like

Origin www.cnblogs.com/mazhenyu/p/11328932.html