Basic problems of multithreaded concurrent programming

This is an old-fashioned topic, but most of the discussions go wrong.

The vast majority of the core of the discussion is how to design a lock to synchronize access to shared variables. This is actually putting the cart before the horse:

We need to design an overpass, not to design a traffic light!

In fact, multi-threaded programming should not have access to a shared variable, if you really want to access shared variables in multi-threaded, the only effective solution is to strictly control the timing. Well, first come first is the only way. As for the design of such locks, it is completely lazy, just to prevent problems.

As early as more than 100 years ago, it was possible to transmit different voice channels on the same telephone line. This benefited from the strict time slot allocation and multiplexing mechanism. Later, as the times progressed, things became worse. This is entirely due to another One type of time slot multiplexing is caused by time slot statistical multiplexing. Modern operating systems and modern packet switching networks are loyal practitioners of this multiplexing method.

I don't think statistical reuse is an efficient way, it may just be a solution that has to be adopted in the face of diverse scenarios. In my opinion, there is nothing better than strict time slot multiplexing if only discussing high efficiency.

Let me give an example. 4 threads access shared variables.

First look at a slightly stricter plan, strictly assigning access order:

#include <stdio.h>
#include <pthread.h>
#include <semaphore.h>

sem_t sem1;
sem_t sem2;
sem_t sem3;
sem_t sem4;

unsigned long cnt = 0;
#define TARGET	0xffffff

void do_work()
{
    
    
	int i;
	for(i = 0; i < TARGET; i++) {
    
    
		cnt ++;
	}
}

void worker_thread1(void)
{
    
    
	sem_wait(&sem1);
	do_work();
	sem_post(&sem2);
}

void worker_thread2(void)
{
    
    
	sem_wait(&sem2);
	do_work();
	sem_post(&sem3);
}

void worker_thread3(void)
{
    
    
	sem_wait(&sem3);
	do_work();
	sem_post(&sem4);
}

void worker_thread4(void)
{
    
    
	sem_wait(&sem4);
	do_work();
	printf("%lx\n", cnt);
	exit(0);
}

int main()
{
    
    
    pthread_t id1 ,id2, id3, id4;

    sem_init(&sem1, 0, 0);
    sem_init(&sem2, 0, 0);
    sem_init(&sem3, 0, 0);
    sem_init(&sem4, 0, 0);

	pthread_create(&id1, NULL, (void *)worker_thread1, NULL);
	pthread_create(&id2, NULL, (void *)worker_thread2, NULL);
    pthread_create(&id3, NULL, (void *)worker_thread3, NULL);
    pthread_create(&id4, NULL, (void *)worker_thread4, NULL);

	sem_post(&sem1);

	getchar();
	return 0;

}

Then we look at the more common approach, namely the locking scheme:

#include <stdio.h>
#include <pthread.h>

pthread_spinlock_t spinlock;

unsigned long cnt = 0;
#define TARGET	0xffffff

void do_work()
{
    
    
	int i;
	for(i = 0; i < TARGET; i++) {
    
    
		pthread_spin_lock(&spinlock);
		cnt ++;
		pthread_spin_unlock(&spinlock);
	}
	if (cnt == 4*TARGET) {
    
    
		printf("%lx\n", cnt);
		exit(0);
	}
}

void worker_thread1(void)
{
    
    
	do_work();
}

void worker_thread2(void)
{
    
    
	do_work();
}

void worker_thread3(void)
{
    
    
	do_work();
}

void worker_thread4(void)
{
    
    
	do_work();
}

int main()
{
    
    
    pthread_t id1 ,id2, id3, id4;

	pthread_spin_init(&spinlock, 0);

	pthread_create(&id1, NULL, (void *)worker_thread1, NULL);
	pthread_create(&id2, NULL, (void *)worker_thread2, NULL);
	pthread_create(&id3, NULL, (void *)worker_thread3, NULL);
    pthread_create(&id4, NULL, (void *)worker_thread4, NULL);
    
	getchar();
}

Now compare the efficiency difference between the two:

[root@localhost linux]# time ./pv
3fffffc

real	0m0.171s
user	0m0.165s
sys	0m0.005s
[root@localhost linux]# time ./spin
3fffffc

real	0m4.852s
user	0m19.097s
sys	0m0.035s

Contrary to your intuition, you might think that the first example degenerates into a serial operation? Wouldn't the advantages of multi-processors be unused? The second one is the correct posture for multi-threaded programming!

In fact, for shared variables, it must be accessed serially anyway. This kind of code cannot be multithreaded at all. Therefore, the real multi-threaded programming:

Be sure to eliminate shared variables.
If you have to share variables, you must strictly control access timing instead of controlling concurrency by grabbing locks.

Now let’s take a look at the Linux kernel. A large number of spinlocks do not really make the kernel multithreaded, but purely for the purpose of "If you don’t introduce spinlocks, there will be problems..."

RSS, percpu spinlock seems to be the right way to handle it, but it does not seem to be easy to serialize the Linux kernel that has been crumpled into a mess of shared variables. Moreover, interrupts cannot control their timing. How about threading interrupt processing? It seems that the effect is not very good.

In the case of concurrency efficiency problems, if you design a powerful lock, in fact you have admitted the problem but do not want to solve the problem. This is a negative response.

Lock, the source of all evil. Canceling shared variables or controlling timing is the truth.

So, what is the difference? The difference is just a suit.

The leather shoes in Wenzhou, Zhejiang are wet, so they won’t get fat in the rain.

Basic problems of multithreaded concurrent programming

Guess you like