In-depth understanding of the implementation principle of glibc barrier

In-depth understanding of the implementation principle of glibc barrier

In the multi-thread synchronization method, the barrier can coordinate multiple threads to stop at a certain point at the same time, and then run them uniformly. The effect is as follows:

barrier

This function is implemented in glibcpthread_barrier_wait.

#include <pthread.h>
    int pthread_barrier_wait(pthread_barrier_t *barrier)

This article will start frompthread_barrier_wait and explain the implementation principles behind it.

The structure of pthread_barrier_t

pthread_barrier_tThe structure of is defined in sysdeps/nptl/bits/pthreadtypes.h and is a union. There are two fields in the union, the first field is an array of type char.

typedef union
{
    
    
  char __size[__SIZEOF_PTHREAD_BARRIER_T];
  long int __align;
} pthread_barrier_t;

The definition of each bit of this char array is in another structurepthread_barrier, which is defined in sysdeps/nptl/internaltypes.h.

This is the true definition of barrier, which has 5 useful fields.

struct pthread_barrier
{
    
    
  unsigned int in;
  unsigned int current_round;
  unsigned int count;
  int shared;
  unsigned int out;
};

The meaning of each field is as follows:

  • in: The number of threads that have reached the barrier.

  • current_round: The base of the current round. Because barriers can be reused, for example, a barrier can allow two threads to pass. After the two threads reach the barrier, the barrier can continue to work and be reused.

  • count: The number of threads that need to reach the barrier in each round.

  • shared: Whether to use between multiple processes.

  • out: The total number of threads out of the barrier.

current_round is a difficult field to understand. It should be noted that the barrier can be used multiple times. After a batch of threads arrive at the barrier and then exit the barrier together, the next batch of threads will You can reach the barrier and then exit the barrier together. current_round is related to this, and it will be explained in depth in the source code interpretation below.

pthread_barrier_wait source code analysis

First, thepthread_barrier_wait function adds 1 to the number of threads entering the barrier field (bar->in), and the variable i stores the value after adding 1. Note that the memory order of acq_rel is used here, because if-else judgment will be made based on i below, the order cannot be disordered here.

In addition, the count value is also read in.

  struct pthread_barrier *bar = (struct pthread_barrier *) barrier;

  unsigned int i;

 reset_restart:

  i = atomic_fetch_add_acq_rel (&bar->in, 1) + 1;
  unsigned int count = bar->count;

The following paragraph is used to handle the scenario where the IN value exceeds the maximum limit. Because the barrier can be reused, for example, if the count is set to 2, you can limit the passage of 2 threads in the first round, and in the second round, you can also limit the passage of 2 threads, and so on. During this process, the bar->in field is continuously incremented, so there may be an overflow scenario. If it overflows, call futex_wait to wait, because other threads will perform a reset operation at the end of pthread_barrier_wait.

     unsigned int max_in_before_reset = BARRIER_IN_THRESHOLD
				   - BARRIER_IN_THRESHOLD % count;

    if (i > max_in_before_reset)
    {
    
    
        while (i > max_in_before_reset)
        {
    
    
            futex_wait_simple (&bar->in, i, bar->shared);
            i = atomic_load_relaxed (&bar->in);
        }
        goto reset_restart;
    }

Next, read the basis of the current round. Ifi > cr + count, it means that enough threads have arrived at the barrier. This thread does not need to wait, and the previous thread needs to be waiter wakes up. Note that the second parameter of futex_wake is INT_MAX, which means all waiters will be woken up.

    unsigned cr = atomic_load_relaxed (&bar->current_round);
    while (cr + count <= i)
    {
    
    
        unsigned int newcr = i - i % count;
        if (atomic_compare_exchange_weak_release (&bar->current_round, &cr,
						newcr))
        {
    
    
            cr = newcr;
            futex_wake (&bar->current_round, INT_MAX, bar->shared);
            if (i <= cr)
                goto ready_to_leave;
            else
                break;
        }
    }

Corresponding to the above code, this section means that there are not enough threads to enter the barrier, so futex_wait is called to wait.

    while (i > cr)
    {
    
    
        futex_wait_simple (&bar->current_round, cr, bar->shared);
        cr = atomic_load_relaxed (&bar->current_round);
    }

At the end of the program, the "overflow" problem mentioned below must be dealt with. When the out value reaches the threshold, current_round, out and in are all set to 0. Equivalent to the reset operation. After reset, the barrier will be in the same state as when pthread_barrier_init was just called.

    o = atomic_fetch_add_release (&bar->out, 1) + 1;
    if (o == max_in_before_reset)
    {
    
    
        atomic_thread_fence_acquire ();
        atomic_store_relaxed (&bar->current_round, 0);
        atomic_store_relaxed (&bar->out, 0);

        int shared = bar->shared;
        atomic_store_release (&bar->in, 0);
        futex_wake (&bar->in, INT_MAX, shared);
    }

gdb observes changes in the internal value of a condition variable

//g++ test.cpp -g
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
int a=0;

pthread_mutex_t numlock;
pthread_barrier_t b;

struct pthread_barrier
{
    
    
  unsigned int in;
  unsigned int current_round;
  unsigned int count;
  int shared;
  unsigned int out;
};

pthread_barrier *b_real = NULL;

void* handle(void *data)
{
    
    
    while(1)
    {
    
    
        pthread_mutex_lock(&numlock);
        a++;
        pthread_mutex_unlock(&numlock);
        printf("thread enter wait point\n");
        pthread_barrier_wait(&b);
        sleep(1);
    }
    return 0;
}


int main()
{
    
    
    pthread_t t1,t2;
    pthread_barrier_init(&b,NULL,2); //初始化屏障
    b_real = (pthread_barrier *)&b;
    pthread_mutex_init(&numlock,NULL);
    pthread_create(&t1,NULL,handle,NULL);
    pthread_create(&t2,NULL,handle,NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);

    exit(0);
}

Preparation Phase

Before debugging the program, in order to better observe the running process, you can install glibc's debuginfo.

My virtual environment looks like this:

[root@localhost test4]# cat /etc/redhat-release
Rocky Linux release 9.2 (Blue Onyx)

The debuginfo can be downloaded at the following addresshttps://dl.rockylinux.org/stg/rocky/9.2/devel/x86_64/debug/tree/Packages/g/. Find the following two items, download them to the virtual environment, and useyum install to install them.

glibc-debuginfo-2.34-60.el9.x86_64.rpm
glibc-debugsource-2.34-60.el9.x86_64.rpm  

With debuginfo, you can enter the source code of glibc for debugging.

Track Run - Round 1

Set a breakpoint on the pthread_barrier_wait method, which is line 31 in the code, and run the code.

[root@localhost test4]# gdb a.out -q
Reading symbols from a.out...
(gdb) b test.cpp:31
Breakpoint 1 at 0x401223: file test.cpp, line 31.
(gdb) r
Starting program: /home/work/cpp_proj/test4/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff77ff640 (LWP 72128)]
[New Thread 0x7ffff6ffe640 (LWP 72129)]
in = 0
thread enter wait point
in = 0
thread enter wait point
[Switching to Thread 0x7ffff77ff640 (LWP 72128)]

Thread 2 "a.out" hit Breakpoint 1, handle (data=0x0) at test.cpp:31
31              pthread_barrier_wait(&b);
Missing separate debuginfos, use: dnf debuginfo-install libgcc-11.3.1-4.3.el9.x86_64 libstdc++-11.3.1-4.3.el9.x86_64

Judging from the running results, thread 2 has currently executed the sentence pthread_barrier_wait(&b).

At this time, set a breakpoint on line 111 of pthread_barrier_wait.c, whose content is +1 to the in variable.

i = atomic_fetch_add_acq_rel (&bar->in, 1) + 1;

The purpose of setting a breakpoint here is to better track the changes in the internal value of pthread_barrier. At the same time, in order to avoid the impact of multi-threads running at the same time, we temporarily turn off multi-threads running at the same time. This can be achieved using set scheduler-locking on.

After this operation, use next to perform single-step debugging and find that the program reaches the sentence i = atomic_fetch_add_acq_rel (&bar->in, 1) + 1;. Then use next to perform a single step. At this time, the value of the bar variable is printed. It was found that the value of in was changed to 1. This is in line with our expectations, because every time a thread enters the barrier, the in value should be +1.

(gdb) b pthread_barrier_wait.c:111
Breakpoint 2 at 0x7ffff789da90: file pthread_barrier_wait.c, line 111.
(gdb) info thread
  Id   Target Id                                 Frame
  1    Thread 0x7ffff7ec4180 (LWP 72124) "a.out" __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0,
    op=265, expected=72128, futex_word=0x7ffff77ff910) at futex-internal.c:57
* 2    Thread 0x7ffff77ff640 (LWP 72128) "a.out" handle (data=0x0) at test.cpp:31
  3    Thread 0x7ffff6ffe640 (LWP 72129) "a.out" handle (data=0x0) at test.cpp:31
(gdb) set scheduler-locking on
(gdb) n

Thread 2 "a.out" hit Breakpoint 2, ___pthread_barrier_wait (barrier=0x404100 <b>) at pthread_barrier_wait.c:111
111       i = atomic_fetch_add_acq_rel (&bar->in, 1) + 1;
(gdb) n
117       unsigned int max_in_before_reset = BARRIER_IN_THRESHOLD
(gdb) p *bar
$1 = {
    
    in = 1, current_round = 0, count = 2, shared = 0, out = 0}

Next, we allow multiple threads to run simultaneously, using set scheduler-locking off to do this. Use continue to continue running. It stops at the sentence i = atomic_fetch_add_acq_rel (&bar->in, 1) + 1; again. Continue with next, and then print the bar variable. You can find that the value of in so far is 2. It is also in line with expectations.

(gdb) set scheduler-locking off
(gdb) c
Continuing.
[Switching to Thread 0x7ffff6ffe640 (LWP 72129)]

Thread 3 "a.out" hit Breakpoint 1, handle (data=0x0) at test.cpp:31
31              pthread_barrier_wait(&b);
(gdb) n

Thread 3 "a.out" hit Breakpoint 2, ___pthread_barrier_wait (barrier=0x404100 <b>) at pthread_barrier_wait.c:111
111       i = atomic_fetch_add_acq_rel (&bar->in, 1) + 1;
(gdb) n
117       unsigned int max_in_before_reset = BARRIER_IN_THRESHOLD
(gdb) p *bar
$2 = {
    
    in = 2, current_round = 0, count = 2, shared = 0, out = 0}

At this timein = current_round + count, therefore the barrier exit conditions are met, and the barrier can be exited to continue execution.

Tracking and Debugging - Round 2

Using continue, the two threads enter the second round of entering the barrier.

Reset here to only run single-threaded operations. Thread 3 stopped at i = atomic_fetch_add_acq_rel (&bar->in, 1) + 1;. Printing the value of the bar variable, you can see that in = 3 at this time, because this is the third thread to enter the barrier in history.

current_round represents the total number of threads entering before this round, so it is equal to 2. out represents the total number of threads that have exited the barrier, and its value should be equal to current_round, which is also equal to 2.

(gdb) c
Continuing.
in = 2
thread enter wait point
in = 2
thread enter wait point

Thread 3 "a.out" hit Breakpoint 1, handle (data=0x0) at test.cpp:31
31              pthread_barrier_wait(&b);
(gdb) set scheduler-locking on
(gdb) n

Thread 3 "a.out" hit Breakpoint 2, ___pthread_barrier_wait (barrier=0x404100 <b>) at pthread_barrier_wait.c:111
111       i = atomic_fetch_add_acq_rel (&bar->in, 1) + 1;
(gdb) n
117       unsigned int max_in_before_reset = BARRIER_IN_THRESHOLD
(gdb) p *bar
$3 = {
    
    in = 3, current_round = 2, count = 2, shared = 0, out = 2}

Next, turn off multi-thread locking and use continue to continue execution. At this time, thread 2 stopped at i = atomic_fetch_add_acq_rel (&bar->in, 1) + 1;. Print the value of bar and find in = 4.

At this timein = current_round + count, therefore the barrier exit conditions are met, and the barrier can be exited to continue execution.

(gdb) set scheduler-locking off
(gdb) c
Continuing.
[Switching to Thread 0x7ffff77ff640 (LWP 72128)]

Thread 2 "a.out" hit Breakpoint 1, handle (data=0x0) at test.cpp:31
31              pthread_barrier_wait(&b);
(gdb) n

Thread 2 "a.out" hit Breakpoint 2, ___pthread_barrier_wait (barrier=0x404100 <b>) at pthread_barrier_wait.c:111
111       i = atomic_fetch_add_acq_rel (&bar->in, 1) + 1;
(gdb) n
117       unsigned int max_in_before_reset = BARRIER_IN_THRESHOLD
(gdb) p *bar
$4 = {
    
    in = 4, current_round = 2, count = 2, shared = 0, out = 2}

Summarize

In front of the source code, all problems are very clear. By analyzing the source code of pthread_barrier_wait.c, this article understands the principle that barriers can make a batch of threads wait at one point at the same time and run at the same time. Barriers can be reused, and three variables are used to achieve this: in, current_round, count.

In the case analysis, glibc's debuginfo was installed, the changes in the values ​​of internal variables were tracked, and the previous source code analysis was verified.

Guess you like

Origin blog.csdn.net/qq_31442743/article/details/131533719