CMake+OpenMP accelerated computing test

write in front

1. In this article,
cmake compiles and tests the effect of openmp.

2. The platform/environment
is windows/linux, cmake
3. Please indicate the source for reprinting:
https://blog.csdn.net/qq_41102371/article/details/131629705

code

The code contains the same for loop using openmp acceleration, using openmp acceleration and using critical, and testing without using openmp. The code directory structure is as follows, please put test_openmp.cpp and CMakeLists.txt into src
Insert image description here
test_openmp.cpp

#include <iostream>
#include <vector>
#include <chrono>

#include <omp.h>

void computeWithOpenMP(const std::vector<int> &data)
{
    
    
    // #pragma omp parallel
    {
    
    
        std::vector<int> result(data.size());
#pragma omp parallel for
        for (int i = 0; i < data.size(); ++i)
        {
    
    
            if (i >= 0 && i <= 1000000)
            {
    
    
                // 使用 OpenMP 并行计算
                result[i] = data[i] * 2;
            }
        }
    }
}

void computeWithOpenMPCritical(const std::vector<int> &data)
{
    
    
    {
    
    
        std::vector<int> result(data.size());
        int count = 0;
#pragma omp parallel for
        for (int i = 0; i < data.size(); ++i)
        {
    
    
            if (i >= 0 && i <= 1000000)
            {
    
    
#pragma omp critical

                // 使用 OpenMP 并行计算
                result[i] = data[i] * 2;
            }
        }
    }
}

void computeWithoutOpenMP(const std::vector<int> &data)
{
    
    
    std::vector<int> result(data.size());
    for (int i = 0; i < data.size(); ++i)
    {
    
    
        // 未使用 OpenMP,串行计算
        if (i >= 0 && i <= 1000000)
        {
    
    
            result[i] = data[i] * 2;
        }
    }
}


int main(int argc, char **argv)
{
    
    
#ifdef _OPENMP
    std::cout << "use _OPENMP" << std::endl;
    std::cout << "max tread: " << omp_get_max_threads() << std::endl;
#else
    std::cout << "no _OPENMP" << std::endl;
#endif
    int size = std::atoi(argv[1]);
    std::vector<int> data(size, 1);

    // 使用 OpenMP 加速的计算
    auto start = std::chrono::high_resolution_clock::now();
    computeWithOpenMP(data);
    auto end = std::chrono::high_resolution_clock::now();
    auto durationOpenMP = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() / 1000.0;

    // 使用 OpenMP 加速,并使用了critical
    start = std::chrono::high_resolution_clock::now();
    computeWithOpenMPCritical(data);
    end = std::chrono::high_resolution_clock::now();
    auto durationOpenMPCritical = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() / 1000.0;

    // 未使用 OpenMP 的计算
    start = std::chrono::high_resolution_clock::now();
    computeWithoutOpenMP(data);
    end = std::chrono::high_resolution_clock::now();
    auto durationNoOpenMP = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() / 1000.0;

    // 打印时间结果
    std::cout << "With OpenMP: " << durationOpenMP << " ms" << std::endl;
    std::cout << "With OpenMPCritical: " << durationOpenMPCritical << " ms" << std::endl;
    std::cout << "No OpenMP: " << durationNoOpenMP << " ms" << std::endl;
    return 0;
}

CMakeLists.txt

cmake_minimum_required(VERSION 3.18)
project(TestOpenMP)

find_package(OpenMP)
add_executable(test_openmp ./test_openmp.cpp)
if(OpenMP_CXX_FOUND)
    target_link_libraries(test_openmp OpenMP::OpenMP_CXX)
endif()

compile.bat

cmake -DCMAKE_BUILD_TYPE=Release -S ./src -B ./build
cmake --build ./build --config Release --target ALL_BUILD

run.bat

.\build\Release\test_openmp.exe 500000000

The parameter 500000000 is the amount of data. Modify different values ​​during testing to see the effect.

Compile and run

compile

cd test_openmp
./compile.bat

run

./run.bat

The following is the result of a data volume of 500000000 on a notebook i7-12700H.
Insert image description here
Please use the first printed information use _OPENMP or no _OPENMP to determine whether openmp is normally configured or available.

About the acceleration effect

According to actual measurements, the acceleration effect varies depending on the hardware, the calculation content of the for loop, the amount of data, and sometimes luck.
The following is chatgpt's answer about the acceleration effect of openmp:
When using OpenMP for parallelization, the performance improvement is not absolute. It depends on multiple factors, including but not limited to the following points:

  1. Parallelization overhead : Parallelization requires additional overhead for synchronization and task division between threads. When the problem size is small, these overheads may outweigh the performance gains from parallelization, resulting in performance degradation.

  2. Data access pattern : The effectiveness of parallelization is affected by the data access pattern. Parallelization can cause performance degradation if data access creates race conditions or causes cache conflicts. In some cases, serial execution may be more efficient.

  3. Hardware resource limitations : The effectiveness of parallelization is affected by the limitations of available hardware resources. If the system has a limited number of physical cores and the problem size is large, the effectiveness of parallelization may be limited.

  4. Task granularity : The finer the granularity of tasks, the better load balancing and parallelization effects can be provided. If the task is too small, the synchronization overhead between threads may exceed the performance benefits of parallelization.

When using OpenMP for parallelization, you can try the following methods to determine whether OpenMP should be used:

  • Based on experience: Evaluate problem size, computational complexity, and hardware environment based on past experience. If the problem is large, computationally complex, and you have sufficient hardware resources, it may be worth trying to parallelize it with OpenMP.

  • Performance analysis: Use performance analysis tools (such as perf, , gprof, Intel VTuneetc.) to evaluate the performance bottlenecks of serial code and check whether there are potential opportunities for parallelization. Through performance analysis, you can determine which code segments can get a large performance improvement and try to parallelize them.

  • Experimental verification: According to the problem size and computational complexity, experimental verification is conducted on small-scale samples. Compare the performance of serial and parallelization, and decide whether to use OpenMP for parallelization based on the experimental results.

It is important to remember that parallelization does not always lead to performance gains, and sometimes it can even lead to performance degradation. When deciding whether to use OpenMP for parallelization, you need to conduct reasonable evaluation and experimental verification, combined with the characteristics of the problem and the hardware environment, to make an informed decision.

reference

CMakeLists adds OpenMP support
CMake+OpenMP compiles and runs the simplest c++ code

over

Mainly engaged in laser/image three-dimensional reconstruction, registration, segmentation and other commonly used point cloud algorithms. For technical exchanges and consultations, please send a private message

Guess you like

Origin blog.csdn.net/qq_41102371/article/details/131629705