Analysis and solution of problems encountered in parallel for processing double loops in openmp

written in front

A classmate encountered a problem with his program, mainly because he used the double loop of parallel for. The result of the code has been calculated incorrectly, but the reason has not been found. Later, I found out that the reason is very simple, that is, in the double cycle, if the first layer is to traverse i, and the second layer is to traverse j, if this j is declared before, it means that j is shared by multiple threads, then maybe omp_set_num_threads()this A thread was using j, which was then changed by another thread.

omp parallel for

test program:

#include <omp.h>
#include <stdio.h>
using namespace std;

#define NUM_THREADS 2

int main()
{
    int i;
    printf("initila i address is %p\n", &i);
    
    omp_set_num_threads(NUM_THREADS);

    #pragma omp parallel for
    for (i = 0; i < 10; ++i)
    {
        printf("thread is %d, i=%d, i address is %p\n", 
                omp_get_thread_num(), i, &i);
    }

    return 0;
}

Before opening the thread, i is declared, and the program is executed, the result is as follows:

initila i address is 0x7ffc060c5034
thread is 0, i=0, i address is 0x7ffc060c4fc4
thread is 0, i=1, i address is 0x7ffc060c4fc4
thread is 0, i=2, i address is 0x7ffc060c4fc4
thread is 0, i=3, i address is 0x7ffc060c4fc4
thread is 0, i=4, i address is 0x7ffc060c4fc4
thread is 1, i=5, i address is 0x7ff8e59dde04
thread is 1, i=6, i address is 0x7ff8e59dde04
thread is 1, i=7, i address is 0x7ff8e59dde04
thread is 1, i=8, i address is 0x7ff8e59dde04
thread is 1, i=9, i address is 0x7ff8e59dde04

It can be seen that although i is declared before the thread is opened, i in the two threads are new i with different addresses.

If we change the program to a double loop, the program looks like this:

#include <omp.h>
#include <stdio.h>
#include <iostream>
using namespace std;

#define NUM_THREADS 2

int main()
{
    int i, j;
    printf("initila i address is %p, initial j address is %p\n", &i, &j);
    
    omp_set_num_threads(NUM_THREADS);

    #pragma omp parallel for
    for (i = 0; i < 4; ++i)
    {
        for (j = 0; j < 1; ++j)
        {
            printf("thread is %d, i=%d, i address is %p, j=%d, j address is %p\n", 
                    omp_get_thread_num(), i, &i, j, &j);
        }
    }

    return 0;
}

The result is as follows:

initila i address is 0x7ffd9bc701a8, initial j address is 0x7ffd9bc701ac
thread is 0, i=0, i address is 0x7ffd9bc70134, j=0, j address is 0x7ffd9bc701ac
thread is 0, i=1, i address is 0x7ffd9bc70134, j=0, j address is 0x7ffd9bc701ac
thread is 1, i=2, i address is 0x7f947b8e2e04, j=0, j address is 0x7ffd9bc701ac
thread is 1, i=3, i address is 0x7f947b8e2e04, j=0, j address is 0x7ffd9bc701ac

It can be seen that there are two threads in total, i and j are defined before the thread is opened, but openmp parallel for will create a new i for each thread, but will not create a new j, j is still the one created at the beginning. This leads to a data conflict in the j of the second cycle in the double cycle.

Solution

Using parallel for will only assign the outermost loop to different threads, and the inner loop of the same outer loop can only be executed by one thread, and the inner loop of the same outer loop will be executed by multiple Thread execution can be used collapse.
When we can write a loop, the memory loop is written as: for (int j = 0; j < xxx; ++j), and a different j can be created for each thread to avoid data conflicts. Or we can #pragma omp parallel for(private j)force each thread to create a new j.

Modification 1

    #pragma omp parallel for
    for (i = 0; i < 4; ++i)
    {
        for (int j = 0; j < 1; ++j)
        {
            printf("thread is %d, i=%d, i address is %p, j=%d, j address is %p\n", 
                    omp_get_thread_num(), i, &i, j, &j);
        }
    }

result:

initila i address is 0x7ffeddabd0b0, initial j address is 0x7ffeddabd0b4
thread is 0, i=0, i address is 0x7ffeddabd040, j=0, j address is 0x7ffeddabd044
thread is 0, i=1, i address is 0x7ffeddabd040, j=0, j address is 0x7ffeddabd044
thread is 1, i=2, i address is 0x7f8b40564e00, j=0, j address is 0x7f8b40564e04
thread is 1, i=3, i address is 0x7f8b40564e00, j=0, j address is 0x7f8b40564e04

modification 2

    #pragma omp parallel for private(j)
    for (i = 0; i < 4; ++i)
    {
        for (j = 0; j < 1; ++j)
        {
            printf("thread is %d, i=%d, i address is %p, j=%d, j address is %p\n", 
                    omp_get_thread_num(), i, &i, j, &j);
        }
    }

result:

initila i address is 0x7ffe42cfda00, initial j address is 0x7ffe42cfda04
thread is 0, i=0, i address is 0x7ffe42cfd990, j=0, j address is 0x7ffe42cfd994
thread is 0, i=1, i address is 0x7ffe42cfd990, j=0, j address is 0x7ffe42cfd994
thread is 1, i=2, i address is 0x7fc7c6054e00, j=0, j address is 0x7fc7c6054e04
thread is 1, i=3, i address is 0x7fc7c6054e00, j=0, j address is 0x7fc7c6054e04

Guess you like

Origin blog.csdn.net/qq_43219379/article/details/124765327