[Base] Java Concurrency safety, activity and performance issues

Foreword

When the multi-threaded Java is a double edged sword, it can make good use of our process more efficient, but concurrency issues arise, our program will become very bad. Concurrent programming note three issues, namely security, activity and performance issues.

Security Issues

We often say that this method is thread safe, this class is thread-safe, then in the end how to understand it thread safe?

Thread safety set to give a very clear definition is more complicated. The more formal definition more complex, the more difficult to understand. But anyway, in the definition of thread safety, the core concept or correctness can be understood as a simple program to perform in accordance with our expectations .
The correctness of the meaning is: a class act and its specification exactly. Security thread can be understood as follows: when multiple threads access a class that can consistently show the correct behavior, then call this class is thread-safe.

The three main sources we want to write thread-safe procedure, it is necessary to avoid concurrency issues: the atomic issues, visibility problems and ordering issues. (Previous article introduces a method to circumvent these three issues) and certainly not all of the code need to analyze these three questions, only the presence of shared data and the data will change, that is, multiple threads can simultaneously read and write the same data , we only need to synchronize operations on shared variables to ensure thread safety.

This also implies that, if you do not share data or share data status does not change , you can also guarantee thread safety.

In summary, we can conclude that the design of thread-safe procedures available from the following three aspects:

Variables are not shared between threads.
Shared variable to immutable.
Use synchronizing access to shared variables.

We introduced earlier, mainly using Java synchronization mechanism synchronized keyword to access variables cooperative threads, synchronized provided an exclusive lock mode. In addition to the built-in locking mechanism synchronized synchronization scheme, further comprising a volatile type variable, explicit lock (Explicit Lock) and atomic variables . The technical solutions based on twelve points have thread local storage (Thread Local Storage, LTS), the same model and so on (will be introduced later).

Data Race

When multiple threads access data and at least one thread will write this data, if we do not use any synchronization mechanism to co-threads access to these variables, it can lead to concurrency issues . In this case we called the data competition ( the Data Race ).

The following Examples will, for example, data contention occurs.

public class Test {
    private long count = 0;
    void add10K() {
        int idx = 0;
        while(idx++ < 10000) {
            count += 1;
        }
    }
}

When multiple threads call add10K(), the data race occurs. Here we use synchronized data synchronization mechanism can prevent its competition.

public class Test {
    private long count = 0;
    synchronized long get(){
        return count；
    }
    synchronized void set(long v){
        count = v;
    }
    void add10K() {
        int idx = 0;
        while(idx++ < 10000) {
            set(get()+1);      
        }
    }
}

Race Conditions

But this time the add10K()method is not thread-safe.
Suppose count = 0, when the two threads simultaneously after performing get () method, get () method returns the same value 0, two threads execute get () + 1 operations, the result is 1, then after two threads results 1 written to memory. 2 would have expected that, but the result is 1. (As to why the same time? I had his head was "blocking" a moment to react, ha ha, ╮ (~ ▽ ~) ╭ , can not seem to stay up all night to write a blog because if the argument you will first need to calculate is calculated, then as after the function call parameters passed in. here get () will be called first, and so it will return the call to set (), so a thread calls over get (), another thread can acquire the lock immediately call to get (). this also will result in two threads get the same value.)

This situation, we called the race condition ( Race for condition Condition ). Race condition, the result means that the program execution thread of execution dependency order .
In the above example, if two threads completely executed simultaneously, then the result is 1; if the two threads are before and after the execution, the result is 2. In a concurrent environment, the order of execution threads is uncertain if the program there is a race condition issue, then it means that the result of program execution is uncertain, but the results of uncertainty is a big problem.

When we speak in front of concurrency bug source, also introduced race conditions . Due to improper timing of the execution and lead to incorrect results. To avoid race conditions problem, you must modify the variables in a thread, some way to prevent other threads using variable, so make sure that other threads can read and modify or after state before modifying the operation is completed, rather than during the modified state .

Problem Solving race conditions of this example, we introduced a locking mechanism to ensure that: other threads can read and modify the state before modifying the operation is completed or after, rather than during the modification state.

public class Test {
    private long count = 0;
    synchronized long get(){
        return count；
    }
    synchronized void set(long v){
        count = v;
    }
    void add10K() {
        int idx = 0;
        while(idx++ < 10000) {
            synchronized(this){
                set(get()+1);    
            }  
        }
    }
}

So in the face of competition and data race conditions we can use the locking mechanism to ensure the safety of the thread!

Active issues

Meaning security is "always bad things do not happen," while another activity is concerned about the goal, that is, "a piece of the right things will happen eventually." When an operation can not continue execution, active problem occurs.
In the serial program, one of the active form of the problem is caused by an infinite loop inadvertently. So that the code can not be executed after the loop. The thread will bring a number of other active issues, such as we stated before the deadlock , livelock and starvation and we will be introduced.

hunger

Hunger (Starvation) refers to the thread can not access to the resources they need and can not go on the implementation.

The most common resource is caused CPU clock cycles of hunger. If the Java application priority improper use of threads , or perform some structures can not end while holding a lock (such as an infinite loop or wait indefinitely for a resource), it may also lead to starvation because of other needs that lock the thread can not get it.

Usually, we try not to change the priority of the thread , in most concurrent applications, you can use the default thread priority. Just change the priority of the thread, the behavior will be platform-dependent, and may lead to the risk of hunger occurs (such as a high-priority thread will always get the resources, and the low priority thread will have been unable to obtain resources ).
When a program calls in some strange place Thread.sleepor Thread.yieldthat this program is an attempt to overcome problems or adjust priorities in response to questions, and try to get a low priority thread execution more time.

The essence of the problem of hunger can be old Confucius said a word to sum up: not scarcity and uneven distribution .

To solve the hunger problem, the following three options:

Ensure adequate resources.
Equitable distribution of resources.
Avoid long-running thread holding the lock.

These three schemes, programs and a program applicable Scene Three more limited, because many scenes, the scarcity of resources is not a solution, time of execution of the thread holding the lock is difficult to shorten. Therefore, the application program two scenes a little more. In concurrent programming, we can use the fair locks to equitable distribution of resources. The so-called fair locks, is a FIFO scheme, the waiting thread is sequential, front row in the waiting queue thread priority access to resources.

Livelock

Livelock (Livelock) is another form of active issues, deadlock and it is very similar, but it does not block the thread. Livelock Although not block the thread, but it can not continue, because the thread will continue to repeat the same operation, and will always fail.

Livelock usually occurs in the application process transactions message: How can not successfully process a message, then the message handling mechanism will roll back the entire transaction, and placed it back to the beginning of the queue. If there is an error message processor when processing a particular message and cause it to fail, then taken out and passed to an error processor whenever the message from the queue, transaction rollback occurs. Since this has been placed at the beginning of the message queue, the processor is repeatedly called and returns the same processing result. (Sometimes also known as a poison message, Poison Message.) Although the message processing thread is not blocked, but can not be implemented. This form of live blocked, usually by the excessive error recovery code cause, because it incorrectly irreparable error as an error repairable.

When a plurality of cooperating thread in response to each other to modify the respective states, and such that any thread can proceed, livelock occurs. This is like two people are too polite to meet half way, in order not to collide, they gave each other to give way to each other, resulting in they collided. So they repeated the next, they cause livelock problem.

Solve the livelock problem, we introduced a retry mechanism randomness . That is, let them try to wait a random amount of time when humility. So, they will not collide and sequential access. We binary exponential backoff algorithm Ethernet protocol, we can also see the introduction of randomness reduce conflict and benefits of repeated failure. In concurrent applications by the length of time and waits for a random back-off can effectively prevent livelock.

Performance issues

Activity closely related issue is the performance issue. Active means that a piece of the right things will happen eventually, but not good enough, because we usually want the right things happen as soon as possible. Comprises a plurality of aspects of the performance problems, such as long service time, in response insensitive, low throughput, resource consumption is too high, or the like can be reduced stretchability. As with the activity and safety, there is a single-threaded program with the same performance problems not only in multi-threaded programs, but there are other performance problems due to the realization of the thread introduced.

The purpose of our multi-threaded program is to enhance the overall performance, but compared with the single-threaded method, using multiple threads will always introduce some performance overhead . Operating expenses caused these include: coordination between threads (such as lock, memory, synchronization, etc.), increased context switching, thread creation and destruction, as well as thread scheduling . If we use a multi-level threads, then these costs may exceed due to increased throughput, responsiveness or computing power brought about by the performance. On the other hand, a very bad program concurrent design, performance and even lower than the performance of sequential programs perform the same function.
When the program more efficient use of available processing resources, as well as the emergence of new processing resources as possible to take advantage of these new resources: If you want to get better performance by the concurrent need to do.

Here we will describe how to assess performance, multi-threaded analysis brought overhead and how to reduce these costs.

Performance and scalability

Performance of the application can be used to measure a plurality of indicators, such as the service time, latency, throughput, efficiency, scalability, and capacity. Some of these indicators (service time, latency) is used to measure the program "run rate", that is a given task unit needs to "fast" to complete the process. Other indicators (production, throughput) for a program of "processing power", that is, in the case of certain computing resources, to complete the "how much" work.

Scalability refers to: when increasing computational resources (e.g. CPU, memory, storage capacity, or I / O bandwidth), certain procedures or the processing capacity increases accordingly. When tuning for scalability, the purpose is to try parallelized computational problems, it is possible to use more computing resources to do more. And our tradition of performance tuning, the purpose is to do the same work with less effort, such as the results before reuse by caching.

Amdahl's Law

Most of the concurrent program is a series of parallel and serial work consisting of work.
Amdahl's Law is described: in the case of increasing the computing resources, the program can theoretically highest speedup, this value depends on the program in parallel with the serial assembly component proportion. Simply put, Amdahl's law represents the ability of parallel computing processor after efficiency improvement.
Assuming F is a portion to be serial execution , then according to Amdahl's law, comprising N processors of the machine, the maximum speedup:

\[Speedup <= \frac{1}{F+\frac{(1-F)}{N}}\]

When N approaches infinity, the maximum speedup tends to \ (\ FRAC. 1 {{}} F. \) . Therefore, if the program needs to calculate 50% of serial execution, then the maximum speedup is only 2, regardless of whether there are multiple threads available. No matter what technology we use, the maximum is only up to 2x the performance.
Amdahl's law quantifies the efficiency of serialization overhead . In the system has a processor 10, if 10% of the program portion required serial execution, then the maximum acceleration ratio of 5.3 (53% utilization), the system has a processor 100, can speedup 9.2 (92% utilization). But having unlimited number of processors, the speedup will not reach 10.

If you can accurately estimate the proportion of occupied part of the process of implementation through, then Amdahl's law can be quantified when there is more accelerated than the available computing resources.

Thread the introduction of overhead

In the process of scheduling and coordination of multiple threads requires some performance overhead. So we have to ensure, parallel performance gains must exceed the Concurrent overhead results, otherwise this is a failure of concurrent design. Here are overhead concurrent brings.

Context switching

If the main thread is the only thread, then it basically will not be scheduled out. If the number of CPU number of threads that can run larger than, the operating system will eventually be a running thread scheduling out, so that other threads can use the CPU. This will cause a context switch, in the process, to save the currently running thread of execution context, and the context of the implementation of the new set of threads is scheduled to come in the current context.

It requires a certain context switching overhead, and the thread scheduling process by the operating system requires access to shared data structures and JVM. It contains more than the overhead of context switching JVM and operating system overhead. When a new thread is switched in, it needs data may not be current local processor's cache, and therefore a context switch will result in some cache misses (loss locality), which thread scheduling more slowly in the first run.
The scheduler assigns each thread can run a minimum execution time, even though there are many other threads are waiting for execution: This is for context switching overhead is amortized over more execution time will not be interrupted, thus improving the overall throughput (in response to the expense of loss).

When the thread is frequently blocked, may also cause a context switch, thereby increasing scheduling overhead, reduced throughput. Because, when the thread lock because there is no competition to be blocked, JVM will usually suspends the thread and allows it to be swapped out.

The actual context switch overhead varies with different platforms, in accordance with experience: in the most common processor context switching overhead corresponds to 5,000 to 10,000 clock cycles, i.e. few microseconds.

Memory Synchronization

Performance overhead comprises a plurality of synchronous operations aspects. May use some special instruction visibility to ensure synchronized and volatile provided that the memory fence (that is, the previous article we introduced the memory barrier ). Fences can flush the cache memory, the cache is invalid, the refresh hardware write buffer, and stop the execution pipeline. Memory barrier may be equally consequential impact on performance because they will suppress some compiler optimization operations . In memory fence, most of the operations are not be reordered.

In assessing the performance impact caused by synchronous operation, it is necessary to distinguish between competitive and non-competitive sync sync. Modern JVM can optimize some lock contention does not occur, thereby reducing unnecessary synchronization overhead.

synchronized(new Object()){...}

JVM will optimize away more by locking escape analysis.
Therefore, we should focus on those places will optimize the occurrence of lock contention.

A thread synchronization may affect the performance of other threads . Synchronous increases the traffic on the shared memory bus, bus bandwidth is limited, and all processors share this bus. If there are multiple threads competing synchronous bandwidth, all using synchronous thread will be affected.

Clog

Non-competitive synchronization can be fully processed in the JVM, and competition synchronization may require the intervention of the operating system, thereby increasing the cost of the system. When lock contention occurs, the failure of competition thread will be blocked. JVM in the realization of blocking behavior, you can use spin-wait (Spin-Waitiin, it refers to cycle constantly trying to acquire the lock until it succeeds) or by the operating system hangs blocked thread . The level of efficiency of these two methods, depending on the context switching overhead and time before successfully acquiring the lock to wait. If the wait time is short, on the use of spin-wait mode; if the long wait time, suitable for thread suspended mode. JVM analyzes the history of the wait time to choose, but most are just JVM thread is suspended while waiting for locks.

When the thread is blocked pending, it will include two context switches, as well as all the necessary operating system operation and cache operations.

Reduce lock competition

Serial operation reduces scalability, and also reduce context switching performance. When competition occurred in both the lock it will also cause problems, so reducing the lock of competition can improve performance and scalability .
When access to a resource protected by an exclusive lock, will be used in a serial fashion - only one thread can access it. If competition occurs in the lock, it will limit the scalability of the code.
In the concurrent program, the main threat to scalability is the exclusive resource lock.

There are two factors that will affect the possibility of competition occurred in the lock: Lock request frequency and duration of each holding the lock. ( Little's Law )
If the product of the two is very small, so most of the operation of the lock will not get to compete happen, it will not have a serious impact on competition in the scalability locked.

Here are reducing the degree of competition in the lock solution.

Narrow the scope of the lock

Reduce the possibility of competing in an effective way is to hold as much as possible to shorten the time lock. For example, some of the lock-independent code block can be removed, particularly those of large overhead operation, and an operation may be blocked (I / O operation).
Although reduced sync block can improve scalability, but the synchronization code blocks can not be too small, because there will be some need for complex operations atomically, the case must be in the same sync block.

Reduced lock granularity

Another way to reduce lock duration way is to reduce the frequency of thread requests the lock (thereby reducing the likelihood of contention). This is done by decomposing the lock and the lock segment to implement techniques, these techniques will employ a plurality of independent lock to protect independent state variables, these variables to change prior to protection by a single lock . These techniques can reduce the particle size of the lock operation, and can achieve higher scalability. Note, however, the more locks use, but also more prone to deadlock.

Lock decomposition

If a plurality of lock needs to be protected independent state variables, then the lock may be divided into a plurality of locks, each lock only protection and a variable, thereby enhancing scalability, and ultimately reduce the frequency of each lock requested.

For example, the following program we can lock decomposition. (Example from "Java Concurrency in Practice")

@ThreadSafe   // 该注解表示该类是线程安全的
public class ServerStatus {
    // @GuardedBy(xxx)表示该状态变量是由xxx锁保护
    @GuardedBy("this") public final Set<String> users;
    @GuardedBy("this") public final Set<String> queries;

    public ServerStatusBeforeSplit() {
        users = new HashSet<String>();
        queries = new HashSet<String>();
    }

    public synchronized void addUser(String u) {
        users.add(u);
    }

    public synchronized void addQuery(String q) {
        queries.add(q);
    }

    public synchronized void removeUser(String u) {
        users.remove(u);
    }

    public synchronized void removeQuery(String q) {
        queries.remove(q);
    }
}

The above procedure represents the part of the monitoring interface to a database server, which maintains a database of users currently logged on and the request is being performed. When a user logon, logoff, start the end of the inquiry or query will call the appropriate method to update the add or remove ServerStatus object. Both types of information are completely independent, so we can try to improve the program lock decomposition performance.

@ThreadSafe
public class ServerStatus{
    @GuardedBy("users") public final Set<String> users;
    @GuardedBy("queries") public final Set<String> queries;

    public ServerStatusAfterSplit() {
        users = new HashSet<String>();
        queries = new HashSet<String>();
    }

    public void addUser(String u) {
        synchronized (users) {
            users.add(u);
        }
    }

    public void addQuery(String q) {
        synchronized (queries) {
            queries.add(q);
        }
    }

    public void removeUser(String u) {
        synchronized (users) {
            users.remove(u);
        }
    }

    public void removeQuery(String q) {
        synchronized (users) {
            queries.remove(q);
        }
    }
}

We will ServerStatus decomposition of the original, the new fine-grained locks to maintain synchronization of state variables. Reducing the lock of competition, improving performance.

Lock segment

When a fierce competition to lock into two locks, which lock may have two there is fierce competition. In the decomposition of the above examples the lock, the lock can not be decomposed further.

In some cases, the lock can be further extended to the decomposition of the lock on a set of independent objects decomposition, this is referred to as the lock segment .

For example, ConcurrentHashMapan implementation using an array comprising a lock 16, each lock protect all hash buckets \ (\ FRAC. 1} {16} {\) , wherein the N-th hash bucket from the first (N mod 16 ) locks to protect.
Suppose the hash function with a reasonable distribution, a uniform distribution can be achieved and the keyword, then this can be reduced to about the original request for the lock \ (\ FRAC. 1} {16} {\) . It is because of this technology, using ConcurrentHashMap can support as many as 16 concurrent writers.

Lock segment a disadvantage: the need to acquire multiple locks to achieve exclusive access will be more difficult and higher overhead. For example, when the mapping range ConcurrentHashMap need to expand and re-calculated hash key value need not larger set points in the barrel, all the segments need to obtain a lock.

The following code shows a hash-based technique used Map lock segments. It has N_LOCKS locks, and each lock to protect a subset of the hash buckets. Most methods only need to get a lock, such as get (), and some methods you need to get to all the locks, but is not required to obtain the same time, such as clear (). (Example from "Java Concurrency in Practice")

@ThreadSafe
public class StripedMap {
    // Synchronization policy: buckets[n] guarded by locks[n%N_LOCKS]
    private static final int N_LOCKS = 16;
    private final Node[] buckets;
    private final Object[] locks;

    private static class Node {
        Node next;
        Object key;
        Object value;
    }

    public StripedMap(int numBuckets) {
        buckets = new Node[numBuckets];
        locks = new Object[N_LOCKS];
        for (int i = 0; i < N_LOCKS; i++)
            locks[i] = new Object();
    }

    private final int hash(Object key) {
        return Math.abs(key.hashCode() % buckets.length);
    }

    public Object get(Object key) {
        int hash = hash(key);
        synchronized (locks[hash % N_LOCKS]) {
            for (Node m = buckets[hash]; m != null; m = m.next)
                if (m.key.equals(key))
                    return m.value;
        }
        return null;
    }

    public void clear() {
        for (int i = 0; i < buckets.length; i++) {
            synchronized (locks[i % N_LOCKS]) {
                buckets[i] = null;
            }
        }
    }
}

Some methods exclusive lock in place of

In addition to narrow the scope of the lock , to reduce the particle size of the lock request , there is a third technique to reduce the influence of the lock is to give up the use of an exclusive lock.
Using some algorithm or lock-free data structures to manage the shared state. For example, concurrent use of the container, the read - write locks, immutable objects and atomic variables.

The back is also planning to introduce these programs.

summary

Combined with our knowledge of speaking in front of concurrency, we can now understand concurrent programming micro and macro. At the micro, the design of concurrent programs we have to take into account the atomicity, visibility and order problem. Out of the micro, from a macro point of view, we designed the program, taking into account safety, activity and performance issues to threads. We are doing performance optimization of the premise is to ensure thread safety, concurrency problems will arise if optimized, then the result would be contrary to our expectations.

Reference:
[1] Geeks column Vang Pao time to make "Java Concurrency in combat"
[2] Brian Goetz.Tim Peierls et al.Java real concurrent programming [M] Beijing: Mechanical Industry Press, 2016