Depth understanding of JVM virtual machine, thread-safe and lock optimization, JVM principle, diagnostics, tuning

1. thread-safe
thread-safe definition: when multiple threads access an object, if not consider these alternately thread scheduling and execution at runtime under the environment and does not require additional synchronization, or any other method of calling coordinate operations, calling the object's behavior can get the right result, then the object is thread safe.

Thread-safe 1.1 Java language
discussed here are thread-safe, it is limited to the existence of this premise shared data access between multiple threads.

According thread safe "level of security" sorted from strong to weak, can each operate in Java shared data into the following five categories:

Immutable
absolute security thread
relative thread-safe
thread-compatible
line-hostile
1. immutable
immutable objects (Immutable) must be thread-safe.

If the shared data is a basic data type, the final keyword may be modified to ensure that it can not be changed in the definition.

If the shared data is an object, it would need to ensure that the behavior of the object will not have any impact on their job status. The simplest is to subject the state variables are declared with the final, so that after the end of the constructor, which is immutable.

The Java API type immutable meet requirements: java.lang.String / java.lang.Number part of the sub and the like.

2. Absolute thread safety
is defined to meet the absolute thread safety thread-safe. Java API noted in that they are thread-safe class, most of the time are not absolute thread safety.

3. Thread the relative safety
relative thread safety is our usual sense talking about thread safety, it needs to ensure a separate thread safe operation of this object, we do not need additional safeguards when calling, but for some particular order successive calls, you may need to use additional means to synchronize the call to ensure a positive end to determine the call.

Java, most of thread-safe classes are of this type, Vector / HashTable / Collections of synchronizedCollection () collection methods such as packaging.

4. thread-compatible
thread-compatible means that the object itself is not thread-safe, but may end by calling the proper use of synchronization means to ensure the safe use of objects in a concurrent environment. Most Java API class are thread-compatible.

The thread-hostile
refers to the code that calls the end regardless of whether synchronization has taken measures can not be used concurrently in a multithreaded environment. Often harmful and should be avoided.

An example of thread-hostile class is the Thread suspend () and resume () method, if two threads at the same time hold a thread object, try to interrupt a thread, another attempt to resume the thread, if concurrently, then, regardless of whether the call had that thread synchronization, the target thread is the risk of deadlock, if suspend () interrupted thread is about to execute resume (), it would certainly be a deadlock. It is because of this reason, these two methods have been abandoned JDK.

1.2 thread-safe implementation
1.2.1 Synchronization mutual exclusion
mutex synchronization (Mutual Exclusion & Synchronization) is a common means of support concurrent validity. Synchronization means for concurrent access to shared data in a plurality of threads, to ensure that the shared data in a same time only one (or some, when using semaphores) threads. The mutex is a means of synchronizing the critical area (Critical Section), mutex (MuTex) and semaphore (Semaphore) are the main way to achieve mutually exclusive. Thus, the words which, because the mutex, is a result of synchronization; exclusive method, synchronization purposes.

synchronized keyword
in Java, the most basic of mutual exclusion synchronization means is synchronized keyword

The principle synchronized keyword:

After compiling synchronized keyword, will form two monitorenter and monitorexit bytecode instruction are before and after the synchronization code block, which requires a two-byte code type parameter to indicate the reference objects to lock and unlock. If the synchronized procedure object parameter specifies that the object reference; if not specified, it is modified according to the synchronized class method or instance method, the object instance corresponding to fetch or Class object as the lock object.
Virtual machine specifications, in the implementation of monitorenter instruction, first try to lock acquisition targets. If the object is not locked or the current thread already owns the lock of the object so, the lock counter is incremented by 1, when performing the monitorexit, the lock count by 1, if the lock counter is 0, the lock is released. If the acquisition target lock fails, the current thread will block waiting.
In Java threads are mapped to the operating system native threads, if you want to block or wakes up a thread, you need an operating system to help, which requires conversion from user mode to kernel mode, and therefore the state transition takes floc are CPU processor time. So the synchronized keyword is a heavyweight lock, while the virtual machine itself will do some optimization, such as the operating system before informing blocked thread plus some spin-wait process, avoid frequently cut into the kernel mode.

Reentrant lock (ReentrantLock) to achieve synchronization
will be different reentrant lock (ReentrantLock) code is written above, the performance of a mutex API level, another level of performance of the original grammar students mutex.

ReentrantLock adds some advanced features:

Interruptible wait: When the locks held by a thread releases the lock for long periods, waiting thread can choose to give up waiting instead to deal with other things, it helps block synchronization process performed for a long time.
Fair Lock: When multiple threads waiting for the same lock, the lock must be obtained sequentially in chronological order application locks; rather than equitable lock does not guarantee this, when the lock is released, either wait for the lock thread has access to the lock. In non-synchronized lock fair, fair of ReentrantLock also non-default, the lock may be used by the constructor fair requirements.
Binding a plurality of conditions: a plurality ReentrantLock object can bind Condition objects simultaneously, and synchronized in, the wait () and notify () or notifyAll () of the lock object may be implemented in a hidden condition and if more than one to when the condition associated with, you have to add an extra lock, and there is no need ReentrantLock this, just call newCondition times () can be.
1.2.2 Non-blocking synchronization
mutex synchronization main problem is that performance problems thread blocks and wake-up brought about, so this is also known as synchronous blocking synchronization (Blocking Synchronized). On the way to deal with the problem, mutual exclusion synchronization belong to a pessimistic concurrency strategy, they do not always think to do the right synchronous measures (such as lock), then there will be a problem, regardless of whether there really competition to share data, which are to lock (conceptual model discussed here is, in fact, a large part of virtual machine optimization unnecessary lock), user mode kernel mode conversion, maintenance lock counter and check for blocked threads need to wait wake-up operation.

With the development of hardware instruction set, we have another option: optimistic concurrency strategy based collision detection, advanced operation that is, if no other thread contention for shared data, and that the operation was successful; if the shared data contention, produce the conflict, it would then take other compensatory measures (the most common remedies is to continue to retry until it succeeds), many optimistic concurrency strategy to achieve this do not need to suspend the thread, so this synchronous said nonblocking synchronization (Non-Blocking synchronization).

We need to operate and includes a collision detection atoms of two steps, can be accomplished by hardware, can be done from a hardware guarantee semantic behavior seems to require only a plurality of operations by the processor instructions, execution of such conventional Have:

Test and set (Test and Set).
Obtain and increase (Fetch and Increment).
Swap (Swap).
Compare and swap (Compare and Swap, hereinafter referred to as CAS).
Load-linked / store conditional (Load Linked / Store Conditional, hereinafter referred to as LL / SC).
1.2.3 No synchronization scheme
to ensure thread-safe, will not necessarily be synchronized, and there is no causal relationship between the two. Synchronization is only a means to ensure the correctness of the shared data contention, if a method should not involve sharing of data, naturally, without any synchronous measures to ensure correct positioning. So there will be some code that is inherently thread-safe. Two categories:

May be reentrant code (Reentrant Code): This code is also called pure code (Pure Code), which may be interrupted at any time code execution, turn off to perform another code (including recursive call to itself), while the control right after the return, any error the original program does not appear. All reentrant code is thread-safe. Reentrant code may have some common features, for example, does not rely on data stored in a heap and the common system resources are used in an amount of status parameters passed in by, the method is not invoked and the like non-reentrant. Analyzing the code includes a reentrant simple principle: if a method that returns the result is predictable, as long as the same data is input, it can return the same results, it can meet the requirements of reentrancy, of course threads safe.
Thread Local Storage (Thread Local Storage): If a piece of code in the required data must be shared with other code, then look at whether the code for these shared data to ensure the implementation in the same thread? Visibility range if they can, put the shared data is limited to the same thread, so that you can ensure data synchronization without contention problem does not occur before the thread.
This application complies with the characteristics: Most consumer queue using architectural patterns (such as "producer - consumer" mode) will speak consumption consumption process as much as possible done in a thread; the classic Web interaction model of "one for each service request thread "(thread-per-Request) approach, widely used this approach makes a lot of Web server applications can use thread local storage to solve thread safety issues.

Java, if a variable is to be accessed by multiple threads, you can use the volatile keyword to declare it as a "variable"; if a variable is exclusive a thread, the thread local storage can be achieved by java.lang.ThreadLocal class function. Each thread Thread object has a ThreadLcoalMap object that stores a set of easy-ThreadLocal.threadLocalHashCode is key to thread-local variables worth KV key-value pairs, ThreadLocal object is the access entry ThreadLocalMap the current thread, each ThreadLocal object contains a unique threadLocalHashCode value, this value can be retrieved using the corresponding native thread in the thread KV variable key pair.

2. Lock optimization
HotSpot virtual machine development team spent a lot of energy in this version to implement various lock optimization techniques, such as adaptive spin locks elimination, lock coarsening, lightweight lock, lock bias.

2.1 spin lock
to solve the problem: mutex synchronization greatest impact on performance is achieved blocked, hung threads and recovery operations need to thread into kernel mode to complete these operations to the concurrent performance of the system will pose a very pressure.

Solution: To make a thread wait, we need to get busy thread executes a loop (spin), this technology is a spin lock

Can not wait for the spin instead of blocking, and I will not speak of the amount of processor requirements, spin-wait time itself, although without the overhead of thread switching, but it is to be processor-intensive, so a short time if the lock is occupied, spin well wait for effect, on the contrary, spin thread only wasteful consumption of processor resources, without doing any useful work, bring unnecessary performance. Therefore, the spin-wait time must have a certain limit, if the spin exceeds the limit of the number still does not get the lock, you should use the traditional way to hang thread. The default value is 10 times the number of spins, the user can use the parameters -XX: PerBlockSpin modified.

2.2 adaptive spin lock
problem solving spin lock: due to the spin lock each selection time is fixed, for the system is not necessary.

JDK1.6 introduced adaptive spin lock. Spin time is determined by the previous state at the same time a spin lock and the lock owner. If a lock on the same object spin-wait just successfully won locks, and thread holding the lock is running, the virtual machine will think that this spin is also likely to succeed, then it will allow for a relatively spin-wait longer. If for a lock, spin rarely successful in obtaining the lock after the spin process may be omitted to avoid a waste of processor resources. With adaptive spin, with the program running and continuously improve performance monitoring information, the virtual machine program lock status prediction will be more accurate, the virtual machine becomes more and more "smart".

2.3 Elimination of the lock
means in a virtual machine running time compiler, a code requires some synchronization, but there can be detected a lock shared data to eliminate competition. Lock eliminate the main escape from judgment based on analysis of data to support, if it is judged in a piece of code, all data will not escape out of the heap, so as to be accessible to other threads, it is possible to treat them as data on the stack that they are thread-private, locked naturally without the need for synchronization.

Many measures are not synchronous programmer added, the prevalence of synchronous code in Java programs may exceed our imagination.

String concatString public (String S1, S2 String, String S3) {
return S1 + S2 + S3;
}
. 1
2
. 3
because String is an immutable class, concatenate a string is always carried out by generating a new String object, Therefore Javac compilers will do String connection automatically optimized. Before JDK1.5, converted to an object StringBuffer append () operation, JDK1.5 and later, will be converted to continuous append StringBuilder object () operation. The sections of the code may become look like.

String concatString public (String S1, S2 String, String S3) {
the StringBuilder the StringBuilder new new SB = ();
sb.append (S1);
sb.append (S2);
sb.append (S3);
return sb.toString ();
}
. 1
2
. 3
. 4
. 5
. 6
. 7
each StringBuilder.Append () in both a sync block, the object lock is sb. Virtual Machine observed variables sb, will soon find its dynamic scope is limited to internal concatString () of. That is, all references to sb never "escape" to concatString (), other threads can not access it, so although there are locks, but can be safely eliminated, after time compilation, this code will ignore all synchronous and direct execution.

2.4 Lock coarsening
some cases, when we write code, always recommended scope will sync block restriction as small as possible - is performed only in the actual scope of the shared data synchronization, if there is lock contention, then wait for the lock threads may also get a lock as soon as possible.

But if a series of successive operations are repeatedly lock and unlock the same object, even locking operation is present in the loop body, and that even if there is no thread contention, frequently mutex synchronization can lead to unnecessary performance loss.

The code segment of the continuous append () This is the case. If the virtual machine to detect a series of such operations are fragmented lock on the same object, the lock will be extended when synchronization (roughening) to the outside of the entire operation sequence of the above sections of the code, for example, is extended to the second before a append () until the last append () after, so only you need to lock it once.

2.5 Lightweight lock
to solve the problem: Due to the traditional heavyweight operating system uses the mutex lock to synchronize every lock unlock operation needs to change from user mode to kernel mode grasp, so the whole system will bring some performance consumption.

Lightweight lock principle:
To understand Lightweight and biased locking lock, you must start from the object HotSpot virtual machine (object header) memory layout. HotSpot VM object header of information divided into two parts, a first part for storing operation data of the object itself, such as the hash code (HashCode), GC generational Age (GC Age) and the like, the length of this part of the 32-bit data and 64-bit virtual machines are 32bit and 64bit, officially known as "Mark Word". Another part of the method used to store a pointer to the data area of the object type. In the 32-bit virtual machine 25bit hash code is used to store objects, 4bit storage object generational ages, 2bit storage lock flag, the other slightly.

 

The principle:

When entering the synchronized block of code, if this synchronization object is not locked (locked flag is "01" state), the virtual machine will first build a space called the locks (Lock Record) in the current thread's stack frame, for the current Mark Word stores a copy of the locked object (the official copy of this plus a Displaced prefix, namely Displaced Mark Word)
Mark Word virtual machine will try to use the CAS operation object is updated to point to lock Record pointer. If the update action is successful, then the thread owns the lock of the object, and the object of Mark Word lock flag (the last 2bit Mark Word) will be converted to "00", it means that this object is lightweight locked state
If this update fails, the virtual machine will first check whether the object of Mark Word points to the current thread's stack frame, if only to explain the current thread already owns the lock of this object, it can directly enter the synchronized block to continue, otherwise explain this lock object has been preempted by other threads.
If more than two threads competing for the same lock, lightweight lock that is no longer valid, is to be expanded heavyweight lock, the lock status of the flag value becomes "10", Mark Word is stored in pointing heavyweight lock (mutex) pointer, followed by a thread waiting for the lock should enter the blocked state.

 

Lightweight lock in accordance with procedures to enhance performance is "For the vast majority of locks, is there is no competition in the entire synchronization cycle," which is an empirical data. If there is no competition, lightweight lock using CAS operations without the overhead of using a mutex. If more than two threads lock contention with a lightweight, lightweight lock that is no longer valid, it is to be expanded heavyweight lock, lock flag status is changed to "10."

2.6 biased locking
solve the problem: the use of lightweight lock CAS operates without competition, every lock unlocking operations require the use of CAS instruction primitives. This operation is only for one thread to access the object, it will cause some performance overhead.

JDK1.6 introduced to the lock is locked optimization, designed to eliminate data synchronization primitives in the absence of competition, to further improve the operating performance of the program. If the lock is to use lightweight CAS operate in the absence of competition cases eliminate the use of mutex synchronization, biased locking is that in the absence of competition in the case of the whole synchronization are eliminated, even the CAS operations are not done.

The principle:

When the lock object is first thread gets, the virtual machine will be the subject header flag is set to "01", that is biased mode. At the same time the use of CAS operation to get to the lock thread ID is recorded in Mark Word object among the CAS if the operation is successful, the future holds a biased locking thread lock every time you enter the relevant sync block, a virtual machine can not any further synchronous operation
when there is another thread to try to acquire the lock, it came to an end bias mode. According to the lock object is currently locked in a state of withdrawal bias (Revoke Bias) back to the unlocked (flag "01") or lightweight locking (flag "00") state, the subsequent synchronization operations as introduced above is performed as a lightweight lock


Biased locking can improve performance with synchronized without competition. It is also a trade-off with optimized efficiency (Trade Off) properties, it is not necessarily beneficial to run the program, if the program most of the locks are always a number of different threads access that tend to lock situation superfluous. Under the premise analyze specific issues, and sometimes use the parameters -XX: -UseBiasedLocking but to prohibit the biased locking optimization can improve performance.
----------------
Disclaimer: This article is CSDN blogger "MasterT-J" in the original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source and link this statement.
Original link: https: //blog.csdn.net/qq_21125183/article/details/85174651

Guess you like

Origin www.cnblogs.com/spark9988/p/11521567.html