The cost of concurrent multithreading in Java

A brief review of threads

The reason for introducing multithreading in the operating system is that the overhead of process switching is too high. When a process switches page tables, it is often accompanied by page scheduling, so the overhead is relatively large. It involves the information of the register state and stack related to itself (the context of the thread mainly includes the value of the register, the program counter, and the stack pointer), so the overhead is relatively small. Therefore, making full use of multithreading can improve the execution efficiency of the system and make full use of resources.

When multiple threads cooperate to complete a task, since resource sharing and data synchronization are involved, an additional thread synchronization mechanism needs to be used in a multithreaded program to ensure the correctness of program execution. Correctness is the premise of performance optimization. On the premise of ensuring correct operation, consider how to make full use of existing resources to improve performance. Multithreaded programs will suffer from thread switching and rescheduling overhead, so performance will be affected.

For single-threaded programs, neither thread scheduling nor synchronization overhead is required, and there is no need to use locks to ensure the consistency of data structures, so single-threaded programs perform better in certain circumstances. However, for independent and isomorphic tasks, using a single thread not only cannot fully utilize the advantages of multi-core processors, but also the scalability of program performance is also very poor. Therefore, the performance improvement brought by using multithreading is definitely greater than the overhead caused by concurrency.

The cost of multithreading

To sum up, the cost of using multithreading has three aspects:

  • Design is more complex

    Because the integrity of data and the correctness of operation need to be guaranteed in concurrent programming, synchronization mechanisms need to be used to ensure this. For example, in the lock used in concurrent programming, the lock itself is also a kind of overhead, and it is also necessary to maintain the synchronization of the lock to the program, so the natural design is more complicated. In addition, because of bugs in multi-threaded operation, they are often irrecoverable.

  • context switch overhead

    There are clock cycles for CPU operation, and only one thread can run in one clock cycle, so for multi-threaded programs, thread switching and scheduling are required. In this process, the CPU needs to save the scene information, and then schedule a new thread to execute the task. This process is called context switching.

  • lead to more resource consumption

    In multi-threaded programming, threads need some memory to maintain the thread local stack, each thread has a local independent stack to store thread-specific data, and the memory overhead is naturally greater. The overhead caused by synchronization operations is also an important aspect.

 

Switching contexts requires access to data structures shared by the operating system and JVM during thread scheduling, and while the application, operating system, and JVM are all the same set of CPUs, the more CPU clocks consumed in the operating system and JVM, the more the application The less CPU clocks are available. In addition, if the data that the newly scheduled thread needs to access is no longer in the cache of the processor, a page fault exception will occur, the operating system will be interrupted (trap), and the operating system needs to load the missing data from other places. The page fault is interrupted, and the execution speed is slower.

For the performance overhead increased by synchronization operations, there are many aspects. Taking the visibility provided by synchronized and volatile as an example, JMM (Java Memory Model) will use some special instructions to ensure that the cache is updated (specifically, using the cache invalidation mechanism, other processing The device uses sniffing technology to determine whether the data it caches is invalid), these memory instructions are also called "memory fences". But "memory fences" can have an impact on compiler optimizations.

 

Further analysis of memory synchronization

The impact of the "memory fence" on performance is still limited. The design of JMM is friendly enough, and there may be no need to worry too much about this aspect. For synchronization, it can be divided into contention synchronization and contention-free synchronization. Uncontended synchronization is small enough to have a limited performance impact on the application. The main causes of performance bottlenecks are generally competing locks, and the more intense the competition, the greater the impact on program performance, and the entire program may not be able to continue to execute at all.

As mentioned in the previous article, it is not correct to have too large a range of locks or to hold multiple locks at once. Although modern JVMs perform some optimizations on locks, such as judging that the currently used lock does not compete, it will perform "lock elimination". The simplest explanation for lock elimination is that if the locks used by the program do not compete, then the locks are not used. And if the JVM detects that adjacent blocks of code use different locks, but these different locks are not different except for different locks, it will perform "lock coarsening" - merging the operations of multiple locks into one Lock operations to remove unnecessary synchronization.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325404540&siteId=291194637