Detailed performance concurrent programming Java

I. INTRODUCTION

This article focuses on the performance of multithreaded applications. The art method by which to reduce lock contention, and how to use the code to implement.

Second, the performance

We all know that you can improve the performance of multi-threaded thread. The root cause of performance lies in our multi-core CPU or multiple CPU. Each CPU core can complete the task themselves, so to a large task into a series of small tasks can be run independently of each other can improve the overall performance of the program. Can give an example, there are programs such as the size for all images in a folder on the hard drive folder to be modified, multi-threading technology can improve its performance. The use of single-threaded manner can only turn through all the picture files and perform the modification, if we have more of the CPU core, then, no doubt, it can only use one core of them. Multi-threaded approach, we can make a producer thread scans the file system to each picture are added to a queue and perform these tasks with multiple worker threads. If the number and total CPU core thread of our work the same number of words, we can ensure that each CPU core has a living to do, until the task is completed fully implemented.

For another program requires more IO wait, the use of multi-threading technology can improve the overall performance. Suppose we want to write such a program, you need to grab all the HTML files in a site, and stores them on the local disk. Program can be started from a certain web page, then parse this page for all the links pointing to this site, then turn crawl these links, so again and again. Since we launched a request from the remote site to receive all the data pages need to wait for some time, so we can use this task to execute multiple threads. Let a little more or a little threads that have been received to parse HTML pages and links will be found in the queue, the thread is responsible for all other Request page.

High performance is to do as many things in a short time window. This is certainly the most classic interpretation of the term of the performance. But at the same time, the use of threads can respond well to enhance the speed of our program. Imagine having such a GUI application, there is an input box above the input box below has a name called "processing" button. When the user presses the button, the application needs to re-render the state of the button (the button appears to be pressed, release the mouse button when the time and restitution), and starts processing the user's input. If you deal with this time-consuming task of user input, then single-threaded program can not continue to respond to user input actions of the other,

Scalability (Scalability) means that the program includes the ability to: computing resources by adding higher performance can be obtained. Imagine we need to adjust the size of the picture a lot, because the number of CPU cores of our machines is limited, so do not always increase the number of threads corresponding increase in performance. On the contrary, because the scheduler needs and close more responsible for creating threads, will take up CPU resources, but may reduce performance.

1, the impact on performance

I write to you, we have made such a point of view: Adding more threads can improve performance and responsiveness of the program. But on the other hand, we want to get these benefits was not easy, but also need to pay a price. Using threads will have an impact on performance improvement.

First of all, the first time the impact from the threads created. Creation of a thread, the JVM from the underlying operating system needs to apply the appropriate resources, and to initialize the data structures in the scheduler in order to determine the order of execution threads.

If the same number of cores and the number of CPU, then your thread, each thread will run on one core, so maybe they will not often be interrupted. But in fact, when you run the program, the operating system will need some of their operations to CPU processing. So, even in this case, your thread will be interrupted and wait for it to resume its operation. When your number of threads exceeds the number of cores of the CPU, the situation may become worse. In this case, JVM process scheduler will interrupt some threads to allow other threads, thread switching time, just the current state of a running thread needs to be preserved, so when the next run to be restored data status. Not only that, the scheduler will its own internal data structures to be updated, and this consumes CPU cycles. All this means, context switching between threads consumes CPU computing resources, so bring no performance overhead compared to the single-threaded case.

Multithreaded program brought another overhead protection from the synchronizing access to shared data. We can use the synchronized keyword to synchronize protection, you can use Volatile keywords to share data among multiple threads. If more than one thread wants to access a particular shared data structure, then the situation occurred contention, then, JVM process needs to decide which of the first, after which the process. If you decide thread is not the thread is currently running to be executed, then the thread switching occurs. The current thread to wait until it has successfully obtained a lock object. JVM can decide for themselves how to perform this "waiting", if the JVM is expected from the successful lock object relatively short time, you can use the JVM wait radical method, for example, keeps trying to get lock object, until it succeeds in this In this way the case may be more efficient, because the comparison process context switch, that was some of the more quickly this way. The thread is waiting for a move back to the state of implementation of the queue will also bring additional costs.

Therefore, we must try to avoid the context of competition brought about because the lock switch.

The following will illustrate the process of this competition occurs both reduced.

2, lock contention

Competition for access to two or more threads lock will bring additional operational overhead, because the competition takes place forcing the scheduler to make a thread waiting to enter the radical state, or let it wait state caused two context switches. There are some cases, lock contention consequences can be mitigated by the following methods:

1. scope less lock;

2. The required frequency acquisition less locks;

3. The amount of used by hardware support optimistic locking operation, not synchronized;

4. less with the synchronized;

The use less Object Cache

  

2.1 reduced synchronization field

  If the code lock hold longer than necessary, so that the first method can be applied. Usually we can be one or more lines of code out of sync area to reduce the current thread holds the lock time. The fewer the number of code running in the synchronization area, the sooner will the current thread releases the lock, so that other threads get locked earlier. This is consistent with Amdahl's Law, because doing so reduces the amount of code you need to synchronize execution.

2.2 split lock

Another method for reducing lock contention is a locked protected code into a plurality of smaller blocks of protection. If your program uses a lock to protect a number of different objects, then, this way there will be useless. Suppose we want to count the number of data through the program, and implements a simple count of class to hold a number of different statistical indicators, and were used to represent a basic count variable (long type). Because our program is multi-threaded, so we need to access these variables operate as synchronization, because these operating actions from different threads. To achieve this, the easiest way is to visit each of these variables as a function of adding synchronized keyword.

2.3 Separation lock

One example above shows how a single lock separated into individual locks, so that each thread will only get them to lock the object you want to modify it. But on the other hand, this approach also increases the complexity of the program, if the implementation is not the right words may also cause a deadlock.

Separating the spin lock is a method similar to the lock, but increasing spin lock is a lock to protect the different pieces of code or object, but using different separation lock lock to protect different ranges of values. java.util.concurrent package of the JDK ConcurrentHashMap that the use of this idea to improve the performance of those heavily dependent on HashMap program. In the implementation, the internal ConcurrentHashMap using different lock 16, instead of a package of the synchronization protection HashMap. 16 Lock each of which is responsible for protecting synchronize access to one of the bit bucket 16 points (bucket) of. As a result, different threads you want to insert the key into different segments of the time, appropriate action will be different locks to protect. But in turn will bring some bad problems, such as the completion of certain actions now need to acquire multiple locks rather than a lock. If you want to copy the entire Map, then, that 16 locks need to complete.

2.4 atomic operation

Another approach is to reduce lock contention using an atomic operation. java.util.concurrent package some common underlying data type atomic operation provides a class package. Atomic achieve class-based processors provide the "Compare Replacement" function (CAS), CAS operation is only the same as the old value of the current value of the register with the operating time will provide an update operation.

This principle can be used in a positive way to increase the value of a variable. If we know the current value of the thread, it would attempt to perform the operation to increase the use of CAS operation. If another thread has been modified during the value of a variable, then the current value of the so-called thread has been provided with the real value is not the same, then JVM to try to get the current value, and try again, repeatedly until success . Although cycling may waste some CPU cycles, but the benefits of doing so is that we do not need any form of synchronization control.

2.5 avoid hot spots snippet

A typical LIST LIST achieve to record the number of elements contained by itself maintains a variable in content, delete or add every element from the list when the value of this variable will change. If used in a single-threaded application LIST words, this is understandable manner, each call to size (after the last calculated value returned directly upon) on the line. If the internal variable LIST is not maintained, then this count, size each call () operation will lead to re-iterate LIST calculated number of elements.

This many data structures are used to optimize the way, to the multi-threaded environment Shique become a problem. Suppose we share a LIST among multiple threads, multiple threads simultaneously whereabouts add or remove elements inside LIST, and to query large length. At this time, the internal count variable LIST become a shared resource, so all access to it must be synchronized. Thus, the whole becomes a hot count variable LIST implementations.

This article describes the optimization of these programs once again shows that each of the optimizing is something that needs a lot of careful observation in real time applications. Optimization premature surface looks like a very reasonable, but in fact is likely to turn become a performance bottleneck.

Guess you like

Origin blog.51cto.com/14512197/2444199