Series | depth understanding of the Java virtual machine six (efficient concurrent)

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/baron_leizhang/article/details/99675741

Starting in the micro-channel public number: BaronTalk , welcome attention!

Efficient concurrent JVM is the last one of the series, This part describes how to implement a virtual machine multi-threaded, multi-thread between how and competitive data sharing and data sharing and competition and bring solutions.

A, Java memory model with thread

Let the computer perform multiple tasks at the same time, not only because the performance of the processor is more powerful, more importantly, because of the computer's processing speed and its storage and communication sub-rate gap is too big, a lot of time spent in disk I / O , network communication and database access on. To keep waiting for the other processor because the processor resources wasted resources and time, we must make way computers simultaneously perform multiple tasks to take advantage of the processor's performance; but also to respond to the needs of highly concurrent server. And there is a thread design and Java memory model precisely in order to better, more efficient multitasking.

1.1 hardware efficiency and consistency

The vast majority of the computer processor computing tasks can not alone be able to complete, at least a processor and memory interaction, such as reading data, etc. stored in the result, the I / O operation is difficult to eliminate. Since the calculator processor speed and memory devices are orders of magnitude of the gap, the computer read and write speeds have added one processor speed as close as possible to the cache as a buffer between the processor and the memory : copy operations need to use the data into the cache, so that operations can be carried out quickly, before the end of the operation when synced back from the cache memory, so there is no need to wait for a slow processor memory read and write.

Cache memory based on the interaction of a good solution to the contradiction processor speed and memory, but also bring greater complexity of computer systems, because it introduces a new problem: cache coherency. In a multi-processor, each processor has its own cache, but they are also sharing the same main memory. When a plurality of processors of a computing task referring to the same area of ​​main memory, the respective cache may lead to inconsistent data. In order to solve the problem of consistency, you have to follow some protocol needs to access the cache of each processor, during read and write operations to be carried out under the agreement.

In addition to increasing the cache, the internal processor to the arithmetic unit can be fully utilized as much as possible, the processor may have entered the code execution shuffling optimization, the results of the processor will be out of order after evaluating recombinant ensure the results are consistent with the order of execution, but does not guarantee program statements of each sequence coincides with the calculated input code sequence, and therefore, if one computing task depends on another computing tasks intermediate results exist, and can not rely on the order of order to ensure the code. Optimization performing similar chaos processor, JIT compiler has a similar instruction rearrangement optimization.

1.2 Java Memory Model

Java Virtual Machine Specification defines a Java memory model, used to mask the differences of various memory access hardware and operating systems to allow Java programs to achieve in a variety of platforms to achieve the same effect memory access. Like C / C ++ language such direct use of the physical hardware and the operating system memory model, and therefore due to differences on different platforms, memory models, you need to write code for different platforms.

Main memory and working memory

The main objective of the Java memory model program access rules defined for each variable, i.e., stored in memory and read from memory variable in the underlying details such virtual machine in the variable. Here that the variables and Java code variables differ, which includes the instance fields, static fields and elements that make up an array of objects, but does not include variables and method parameters, because the latter is a thread private and will not be shared. Specific registers or buffers in order to obtain a better execution performance, Java memory model is not restricted to the execution engine using a processor and main memory to interact, there is no limitation JIT compiler for code optimization measures such execution order.

Java memory model specifies that all variables are stored in the main memory, each thread has its own separate working memory, working memory thread stored in the main memory of the thread is used to copy a copy of the variable, the variable thread All operations must be carried out in the working memory, but can not directly read and write main memory, a thread transmission between the variable values ​​are required to complete the main memory.

Interaction between memory

About between main memory and working memory specific interaction protocols, namely a variable how to copy from main memory into the working memory, how to synchronize back to the details of the main memory and the like from working memory, Java memory model defines the following eight kinds of operations to complete, virtual We must ensure that each of the following operations when the machine is implemented atoms, can not be divided.

The 8 operation are: lock (lock), unlock (unlock), read (read), load (load), use (use), assign (assignment), store (store), write (write).

Special rules for volatile type variables

volatile is the most lightweight synchronization mechanism provided by the Java virtual machine. When a variable is defined as volatile, it will have two properties:

The first is to ensure that this variable visibility to all threads, here's "visibility" is when a thread modifies the value of this variable, the new values ​​for the other thread is immediately learned. Ordinary variables can not do this, you need to pass data between threads through the main memory. For example, a common thread A revised variable value, and then written back to main memory, and then another thread from the main memory read and write after write-back in B A thread is finished, the new value of the variable will be visible to thread B.

The second instruction is prohibited reordering optimization. It will ensure that only common variable during the execution of the method where all dependent evaluation results
can be obtained correct results, but can not ensure consistent execution order of the program code in the variable assignment. Because during the execution of a method thread can not perceive this point, which is the so-called Java memory model described in the "inner thread performance for serial semantics."

Special rules for long and double-type variables

Java memory model requires lock, unlock, read, load, assign, use, store, writer eight operations are atomic, but for 64-bit data types (long and Double), in the model specifically defines a relatively loose provisions: partitioning allows read and write operations to the virtual machine is not modified volatile 64-bit data into two 32-bit operation is performed, which allows virtual machine implementation may not be guaranteed to select the type of 64-bit data load, store, read and write four atomic operation. This agreement is non-atomic and double the so-called long.

If there are multiple threads share a not declared volatile long or variable of type double, and at the same time they are read and modify operations, so some threads might read a wrong value. Fortunately, such cases are very rare, mainstream commercial virtual machine is also regarded to operate long and double regarded atom, and therefore to modify a variable without the use of volatile in the actual development.

Atom, visibility and orderliness

Java memory model is centered on how to deal with atomic concurrent process, visibility and ordering three traits to create.

  1. Atomicity (Atomicity): directly by the Java memory model to ensure atomicity operation variables, including read, load, assign, use, store and write, we can generally be considered the basic data types have read and write access is atomic. If the application scenario requires atomicity guarantee a greater range, Java memory model also provides the lock and unlock operations to meet this demand, despite the virtual machine is not to lock and unlock operations directly open to users, but it offers higher level monitorenter and monitorexit bytecode instructions implicitly using these two operations, both reflected in the Java bytecode instruction code is synchronized keyword, thus operation between the modified code blocks or synchronize method is It includes atomic.
  2. Visibility (Visibility): Visibility is when a thread changes the value of shared variables, other threads can be aware of this change immediately. Java memory model by modifying the value of the variable after the new sync back to main memory before reading the value of this variable dependent on the main memory from the main memory to refresh the variable transfer medium as a way to achieve visibility, whether ordinary or volatile variables variable is the case, the difference between ordinary variable and volatile variable is, volatile rules to ensure that the new value immediately synchronized to the main memory, and immediately refresh from main memory before each use. Therefore, it can be said volatile to ensure the visibility of multi-threaded operating variables, while the average variable can not guarantee this. In addition to volatile, Java there are two keywords synchronized and final. synchronized visibility sync block is from "before performing the unlock operation on a variable, the variable must first sync back to main memory (execution store, write operation)" This rule obtained; visibility final means: The final Once the initialization is complete modification of the field in the constructor, and the constructor does not pass out "this" reference, you will be able to see the value of final field in the other thread.
  3. Orderliness (Ordering): Java programs natural ordering can be summarized as: If the thread is present, all operations are ordered; observed if another thread in a thread, all operations are disordered of. The first phrase refers to "show within the thread serial semantics", after the word means "instruction reordering" phenomenon and "working memory and main memory synchronization delay" phenomenon. Java language provides two keywords volatile and synchronized to ensure the orderly operation between threads, volatile keyword itself contains semantic disable command rearrangement, and is synchronized to allow a variable from "only at the same time a thread lock its operation "rule obtained, this rule decided to hold a lock with two synchronous blocks can only enter the serial.

The principle of first occurrence

If all of the Java memory model ordering all alone volatile and synchronized to ensure that there are some operations will become very complicated, but we did not feel at the time of writing Java code for concurrency of this, it is because Java there is a language (happens-before) the principle of "first occurs." This principle is very important, which is to determine whether there is competition data, based primarily on whether the thread safe, relying on this principle, we can solve all the problems package if a possible conflict between the two operations under a concurrent environment by several rules.

Occurs first partial order relation between two operations defined in the Java memory model, if operation occurs in the first A operation B, in fact, that before the occurrence of operation B, the operation can affect the operation of the A B produced was observed, "affect" includes modifying the values ​​of shared variables in memory, it sent a message, calling the method.

There are some natural advance a relationship under the Java Memory Model, these advance a relationship without any assistance synchronizer existed, it can be used directly in the code. If the relationship between the two operations is not listed, and can not be derived from the following rules, they have no order of protection, virtual machine can freely reorder them.

  • Program sequence rule: within a thread, according to the order of the program code, written in the first EDITORIAL code behind the code generation. To be precise, it should be the control flow sequence order rather than program code, to be considered as branches, loops, and other structures;

  • Tube-lock rules: a first unlock operations occurs in lock behind a lock for the same operation;

  • volatile variable rules: Write operations on a volatile variable first occurred in the face of this variable after the read operation, we understood this principle we can understand why DCL singleton mode Why use volatile to identify the instance of the object;

  • Thread start rule: start thread () method ahead of all other actions in this thread occur;

  • Thread terminates rules: thread in all operations take place in advance of the termination detecting this thread;

  • Interrupt rules: calling thread interrupt () of the first occurrence in the interrupted thread code detection to interrupt time;

  • End of rule objects: initialization of an object is completed first start occurred in its finalize () of;

  • Transitivity: A first operation occurs in B, B first occurs between C, then A occurs first in C.

1.3 Java thread

Talk about Java concurrency, multithreading and usually are related. This section we will talk about the implementation of the Java virtual machine threads.

Implement threads

Mainstream operating systems provide a thread implementation, Java language provides a unified treatment of the threads operate under different hardware and operating system platforms, each has performed start () instance of the Thread class and not the end of it represents a thread. Thread class for all of the key methods are Native. The Java API, a Native method often means that this method does not use or can not use the platform-independent means to achieve (of course it may be to the efficiency of the use of Native method, however, is usually the most efficient means of platform-specific means).

Implementing Threads There are three ways: using kernel threads implementation, user thread implementation, user threads, plus lightweight hybrid implementation process.

Implementation of the Java threads

Java thread is a user thread called "green thread" based on the realization before JDK 1.2. In JDK 1.2, the threading model to implement an operating system to replace the native threads Model. Therefore, in the current JDK version, operating system support what threading model, largely determines the thread Java virtual machine is how maps, this is no way to agree on different platforms, virtual machine specification there is no need for qualified Java thread which threading model to use to achieve. Threading model affects only the size and operating costs of concurrent threads of the process of coding and run Java programs, these differences are transparent.

1.4 Java thread scheduling

Thread scheduling process refers to a thread assigned to the processor system usage rights, there are two main scheduling, namely, thread scheduling and cooperative preemptive thread scheduling.

Collaborative thread scheduling

If you are using scheduling collaborative multi-threaded system, the execution time of the thread is controlled by the thread itself, the thread after the implementation, to take the initiative to notify the working system is switched to the other thread. The greatest benefit collaborative multithreading is simple, and because the thread after their own affairs will be done thread switch, the switching operation of the thread they are known, there is no problem synchronizing all threads. But its disadvantages are also obvious: the thread execution time is not controllable, even if there is a problem writing a thread, do not tell the operating system has been a thread switch, the program would have been blocked there. A long time ago Windows 3.x system is to use cooperative multitasking to the process, rather unstable, a process insist on keeping the CPU execution time can cause the entire system to crash.

Preemptive thread scheduling

If it is multi-threaded systems use preemptive scheduling, then the system will allocate each thread execution time, thread switching thread could not help itself to decide. In this implementation mode thread scheduling, execution threads implementation is controllable system, there will not be a problem entire process resulting in blocked thread, the thread scheduling is preemptive use of Java's. And examples of the previously mentioned Windows 3.x relative, in Windows 9x / NT kernel is to use pre-emptive to implement multi-process, when a process is a problem, we can also use the Task Manager to move the process "to kill ", and does not lead to a system crash.

1.5 State Transition

Java language defines five thread state, at any one point in time, a thread can only have one and only one state, they are:

  • New (New): After creating a thread that has not been started in this state;
  • Run (Runnable): Runnable, including the operating system thread state Running and Ready, the thread is in this state are likely being performed, there may be waiting for the CPU execution time allocated for it;
  • Wait indefinitely (Waiting): thread in this state will not be assigned CPU execution time, which is to wait for the other thread is explicitly resumed; thread into the following three methods will wait state indefinitely:
    • TimeOut parameter is not set the Object.wait ();
    • TimeOut parameter is not set the Thread.join ();
    • LockSupport.park()。
  • Waiting period (Timed Waiting): thread is in this state can not be allocated CPU execution time, but without waiting to be explicitly wake other threads, they will wake up automatically by the system after a certain time; the following methods will thread into the wait state deadline:
    • Thread.sleep();
    • Object.wait TimeOut parameter set ();
    • Thread.join TimeOut parameter set ();
    • LockSupport.parkNanos();
    • LockSupport.parkUntil ().
  • Blocked (Blocked): the thread is blocked, and "blocked" and "wait state" difference: "blocked" waiting to obtain an exclusive lock, this event will give up this lock in another thread of time occurrence; and "wait state" is waiting for some time, or send a wake-up action. When the program enters the waiting for synchronization area, the thread enters a state;
  • End (Terminated): Thread has completed execution.

5 above conditions encountered when a specific event occurs will be converted to each other, as shown below:

Second, the security thread and lock optimization

Concurrent theme of this paper is efficient, but efficient on the premise that we must first ensure the accuracy and safety of concurrent, so this section we start talking about how to ensure the security thread concurrency.

2.1 Java thread safety

So what is it thread safe? When you can simply understood as a multi-threaded operating on the same block of memory, changes in the value of memory is to be expected, not because of multithreaded operation on the same block of memory and access leads to values ​​stored in memory appears uncontrollable problem.

Thread-safe Java language

If we do not thread-safe concept is defined as an either-or (either absolute security thread or threads never safe), then we can turn to a weak divided into the following five files from strong according to the degree of thread safety:

  1. Immutable;

  2. Absolutely thread-safe;

  3. Relative thread-safe;

  4. Thread-compatible;

  5. Thread-hostile.

Thread-safe implementation

Although thread-safe or not and Coding has a great relationship, but the virtual machine provides synchronization and locking mechanisms also play a very important role. Here we take a look at how the virtual machine level is to ensure thread-safe.

  1. Synchronization mutex

Mutex synchronization is a common means to guarantee the correctness of concurrency. Synchronization refers to when multiple threads concurrently access shared data, ensure that the shared use of data is only one thread at a time. The mutex is a means to achieve synchronization. Java mutex most basic synchronization means is synchronized keyword, keywords synchronized after compilation, is formed monitorenter and monitorexit two bytecode instruction are before and after the sync blocks, which require a two-byte code reference parameter to specify the type of object you want to lock and unlock. If the Java program synchronized object parameters are clearly indicated that reference the object; if not, it is modified in accordance with synchronized instance method or class method to get the corresponding object or class instance object as a lock object.

According to the requirements specification of the virtual machine, in the implementation of monitorenter instruction, we must first try to get a lock object. If the object has not been locked, or the current thread already owns the lock object, the lock with the counter is incremented; Accordingly, when executing the lock instruction monitorexit counter by 1, when the lock counter is 0, the lock is freed. If you acquire a lock object fails, the current thread will block until the object lock is released by another thread.

Also to be noted is that the synchronization block before executing the thread has entered, it can block behind into the other threads. Since Java threads are mapped to the top of the operating system native threads, or if you want to wake up blocking a thread, we need to help complete operating system, which requires conversion from user mode to kernel mode, the thread state transition takes a lot of processor time. For simple sync blocks (as modified synchronized getters () and the setter () method), the state transition time may be consumed by a user code is longer than the time consumed. Java is so synchronized in a heavyweight operation, so we should only use it if necessary. Of course, the virtual machine itself will do the corresponding optimization, such as adding before the operating system block the thread for a spin-wait process, avoid frequent user mode to kernel mode conversion process. This is our time to introduce another small talk in the lock optimization.

  1. Non-blocking synchronization

    Mutex synchronization biggest problem is that performance problems thread blocks and wake-up brought about, so this has become a synchronous blocking synchronization. From the approach to the problem, the mutual exclusion synchronization is a pessimistic concurrency strategy that they do not do the right synchronous measures (such as locking), it will certainly be a problem, regardless of whether there will be competition to share data, it will to lock (of course, the virtual machine will be optimized away some unnecessary lock). With the development of hardware instruction set, we have another option: optimistic concurrency strategy based conflict checks . In layman's terms, is the first to do if there is no other thread contention, that the operation was successful; if there are other threads to share data competition, resulting in a conflict, to take other remedial measures, many optimistic concurrency strategy to achieve this are You do not need to suspend the thread, so that the synchronization operation is called a non-blocking synchronization .

    The reason why the previous development needs hardware instruction set, because we need to operate and collision detection of these two steps have the atomicity.

    The atomicity rely on to ensure it? If there is then synchronized to ensure the exclusive use of the atom would be meaningless, so we can only rely on hardware to get this done, from behavior to ensure a semantically seems to require multiple operations by only a single processor instruction can be done such instructions are used:

    • Test and set (Test-and-Set)
    • Obtain and increase (Fetch-and-Increment)
    • Swap (Swap)
    • Compare and swap (Compare-and-Swap, referred to as CAS)
    • Link load / store-conditional (Load-Linked / Store-Conditional, referred to as LL / SC)

    The first three before the processor instruction set in there, and the latter two are new.

    CAS instructions require three operands, namely memory location (in Java can be understood as simply a memory address of a variable, represented by V), the old value of the expected (represented by A) and the new value (represented by B). When the CAS instruction, if and only if the expected value of V in line with the old A, B processor with the value of the new updated value V, otherwise he would not perform the update, but whether or not to update the value of V, will return the old value of V, the above-described process is an atomic operation.

    After JDK 1.5, Java programs can use the CAS operation, the packaging operation is provided by several methods in the class sun.misc.Unsafe compareAndSwapInt () and compareAndSwapLong (), within the virtual machine to do a special treatment of these methods the processor CAS command instant translation of the result is a platform-dependent, there is no way of calling process, or may be considered to be within unconditionally linked into it.

    Due to Unsafe class is not available to the user program calls the class, so if that is not reflected, we can only indirectly through other Java API, such as JUC bag integer atom type, which compareAndSet () and getAndIncrement () methods are Unsafe operation using the CAS class.

    Although CAS looks beautiful, but the operation did not cover all scenarios are mutually exclusive synchronized, and CAS semantically not perfect. If a variable V when the initial read is the value of A, and checks when preparing an assignment it is still value A, then we can say that the value is not modified by another thread before? If during this time had been changed to B, and later was changed back to A, CAS operations will think that it has never been changed. This vulnerability is called CAS operations, "ABA" issue.

    In order to solve the "ABA" issue, JUC package provides the class with atomic reference mark AtomicStamoedReference, it can to ensure the correctness of CAS by controlling the value of the variable version. But this class compare "tasteless", the ABA problem in most cases will not affect the correctness of concurrent programs, if the need to address the ABA problem, use the traditional mutual exclusion synchronization may be more efficient than the atomic class.

  2. No synchronization scheme

    To guarantee thread safety does not have to be synchronized, if a method should not involve sharing of data, then it naturally without any synchronization measures, so there will be some code is inherently thread-safe, including the following to say re-entrant code and thread local storage .

    Re-entrant code (Reentrant Code): also known as pure code, it can be interrupted at any time code execution, turn to the other end of the implementation of the code (including calls itself recursively), and after regaining control of the original program does not any errors. Reentrant code may have some common features, for example, does not rely on data stored in a heap and the common system resources used by the state quantity parameters passed, the method does not call the non-reentrant like. If the method returns a result can be predicted, the same as long as the input, can return the same output, then it is reentrant code it may, of course, is thread safe.

    Thread Local Storage (Thread Local Storage): This data is the thread that is unique, ThreadLocal is to achieve the thread local storage.

2.2 Lock optimization

HotSpot virtual machine development team spent a lot of energy to achieve a variety of lock optimization, such as spin lock and adaptive spin locks elimination, lock coarsening, lightweight lock, tend to lock.

Spin lock spin adaptive

We were talking earlier spin lock mutex synchronization when he mentioned, the greatest impact on the performance of mutual exclusion synchronization is blocked achieve, suspend and resume the thread threads are related to the user mode to kernel mode conversion, this state the conversion system of concurrent performance will bring a lot of pressure. But in most scenarios, data sharing locked state only for a short period of time, for this short time to suspend and resume threads appear less cost-effective. If the physical machine has more than one processor that allows two or more threads in parallel processing, we can get behind that request thread lock "wait a minute", but do not give up the processor execution time and see who is there a thread lock will soon release the lock. To make a thread wait, we just need to execute a loop idle (spin), which is called the spin lock.

While waiting for the spin to avoid the overhead of thread switching, but it takes up processor time. If the lock is occupied for a short time, then wait for the spin effect is very good; the other hand, if the lock is occupied for a long time, then they will just spin thread processor intensive, but the formation of negative optimization. So spin-wait must have a limit, but this limit is set to a fixed value, if not the most selective, virtual machine development team designed an adaptive spin lock , so that the spin wait time is no longer fixed, but by the former once in the spin state of the owner with a lock and lock time to decide. If a lock on the same object spin-wait just successfully won locks, and thread holding the lock is running, the virtual machine will think this is also likely to be successful spin, spin will wait longer. If for a lock, spin-wait too rarely successful, that in the future to acquire the lock when it will give up the spin. With adaptive spin, with the program running and continuously improve performance monitoring information, the virtual machine program lock status prediction will be more accurate.

Lock elimination

Time compiler at run time, for some of the code requires synchronization, but can not be detected the presence of shared data will be competing lock lock eliminated. The elimination of the main escape from judgment based on analysis of data to support, determine if a piece of code, all data on the heap will not escape out so as to be accessible to other threads, it is possible to treat them as data on the stack that they thread is private, natural synchronization lock is not necessary.

Lock coarsening

When we coding, always recommend action to limit the scope of sync blocks to a minimum, just before synchronization, so in order to make as much as possible the number of synchronous operation requires smaller, if there is competition in the actual scope of shared data, that threads waiting for the lock can get the lock as soon as possible. Usually, this is correct, but if a series of successive operations are repeated locking and unlocking of the same object, or even lock operation is present in the loop body, even if there is no thread that competition, frequently mutex synchronization is also It will lead to unnecessary performance loss. That lock appear in the body of the loop by example, virtual opportunities to this case, the lock will be synchronized range extension (rough) to extracorporeal circulation, so long as a lock on it, which is lock coarsening.

Lightweight and biased locking lock on there will no longer introduce, if we are interested can Feedback, I issued a document in a separate presentation.

Conclusion

At this point, the entire JVM series of updates over, this series of articles are basically compiled by my reading notes from, hoping to be helpful to everyone. Due to space constraints, coupled with my limited level, the essence of the book failed to appear one by one. The students want to learn the Java virtual machine is recommended to read the teacher's Zhou Zhiming original.

References:

  • "In-depth understanding of the Java Virtual Machine: JVM advanced features and best practices (2nd edition)"

If you like my article, I will focus on the public under the number BaronTalk , know almost columns or GitHub add a Star on it!

Guess you like

Origin blog.csdn.net/baron_leizhang/article/details/99675741