OpenMP parallel operation call method to improve operation speed C++

 1 #include <iostream>
   #include <omp.h>
 2 int main(){
 3     int sum = 0;
 4     int a[10] = {1,2,3,4,5,6,7,8,9,10};
 5 #pragma omp parallel for reduction(+:sum)
 6     for (int i=0;i<10;i++)
 7         sum = sum + a[i];
 8     std::cout<<"sum: "<<sum<<std::endl;
 9     return 0;
10 }

#progma omp parallel for means parallel execution of for loop

The sum variable is normalized (addition +), otherwise there may be conflicts in the operation of the sum variable, resulting in inconsistent results;

Other symbols that can be normalized:

reduction (operator: var1, val2, ...)

The initial values of operator and convention variables are as follows:

Operator Data type Default initial value

+ Integer, floating point 0

-Integer, floating point 0

* Integer, floating point 1

& Integer All bits are 1

| Integer 0
^ Integer 0

&& integer 1

|| Integer 0

Take the maximum value in parallel

 1 #include <iostream>
 2 int main(){
 3     int max = 0;
 4     int a[10] = {11,2,33,49,113,20,321,250,689,16};
 5 #pragma omp parallel for
 6     for (int i=0;i<10;i++)
 7     {
 8         int temp = a[i];
 9 #pragma omp critical
10         {
11             if (temp > max)
12                 max = temp;
13         }
14     }
15     std::cout<<"max: "<<max<<std::endl;
16     return 0;
17 }

#pragma omp critical affirms that the critical section avoids conflicts, but it can also cause waiting, and the parallel efficiency may be reduced.

Another method of parallel execution SECTION

 1 #pragma omp parallel sections
 2 {
 3     #pragma omp section
 4     {
 5         function1();
 6     }
 7 　　#pragma omp section
 8     {
 9         function2();
10     }
11 }

For ROS compilation, it needs to be added in CMakelists

find_package(OpenMP)
if (OPENMP_FOUND)
    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
    set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")
endif()

Define scheduling strategy

#pragma omp parallel for schedule(static, 2) //static scheduling strategy, the for loop is divided into a task every two iterations

Scheduling strategy Function Applicability

The static loop variable area is divided into n equal parts, and each thread scores n tasks. There is little difference in the performance of each CPU

The dynamic loop variable area is divided into n equal parts. After a thread executes one part, the other tasks that need to be executed are executed. There is a big difference in operating ability between CPUs.

The guided loop variable area is divided into n parts from large to small. The operation method is similar to dynamic. Since the number of tasks is more than dynamic, the scheduling overhead can be reduced.

runtime applies one of the above three scheduling strategies at runtime, the default is to use static

Synchronization mechanism

single, used before a code segment that is only executed by a single thread, indicates that the following code segment will be executed by a single thread.

barrier , used to synchronize the threads of the code in the parallel zone. All threads must stop when they execute to the barrier, and then continue to execute when all threads execute to the barrier.

master, through #pragma omp mater to declare that the corresponding parallel program block is only completed by the main thread.

nowait is used to cancel the barrier, and its usage is as follows:

#pragma omp for nowait //不能用#pragma omp parallel for nowait

#pragma omp single nowait

#include <iostream>
#include <omp.h> // OpenMP编程需要包含的头文件

int main()
{
#pragma omp parallel
    {
#pragma omp for nowait
        for (int i = 0; i < 1000; ++i) 
        {
            std::cout << i << "+" << std::endl;
        }
#pragma omp for
        for (int j = 0; j < 10; ++j) 
        {
            std::cout << j << "-" << std::endl;
        }
    }
    
    return 0;
}

After one of the two threads of the first for loop is executed, the execution continues, so the + of the first loop and the-of the first loop are printed at the same time.

#include <iostream>
#include <omp.h> // OpenMP编程需要包含的头文件

int main()
{
#pragma omp parallel
    {
#pragma omp master
        {
            for (int j = 0; j < 10; ++j) 
            {
                std::cout << j << "-" << std::endl;
            }
        }

        std::cout << "This will printed twice." << std::endl;
    }

    return 0;
}

After comparison, a single-layer for loop is effective when a large number of repetitions are added, while others are ineffective, and even slower! ! !

Reference source:

https://www.cnblogs.com/yangyangcv/archive/2012/03/23/2413335.html

https://blog.csdn.net/donhao/article/details/5652552

https://blog.csdn.net/donhao/article/details/5657371

https://blog.csdn.net/donhao/article/details/5657428

https://blog.csdn.net/SHU15121856/article/details/79350474

https://blog.csdn.net/weixinhum/article/details/97808767

https://blog.csdn.net/donhao/article/details/5656390

https://blog.csdn.net/donhao/article/details/5657717