Various problems arise when using ParallelStream to improve efficiency

1. Foreword
The editor recently took over a new task that required access to a third-party interface for tens of millions of data. As you can imagine, the performance requirements were extremely high. At that time, my first reaction was to use ParallelStream to operate the data when requesting the third-party interface. This is not right. Is it solved? The imagination was wonderful, but the result was that I was directly pressed to the ground and rubbed...

2. Scenario
Let’s look at the code first

   

List<Student> studentList = mapper.getinfo(createDate.toString("yyy-MM-dd"));
           
        List<Student> result = new ArrayList<>();
        studentList.parallelStream().forEach(o -> {
           //数据处理
            infos.add(o);
 
        });

There seems to be no problem, but the execution finds that the number of elements in the result set is not equal to the expected number of elements, and there are null elements in it, and there is a chance of an array subscript out-of-bounds error.

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
    at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
    at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
    at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
    at java.util.stream.ForEachOps$ForEachOp$OfInt.evaluateParallel(ForEachOps.java:189)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
    at java.util.stream.IntPipeline.forEach(IntPipeline.java:404)
    at jit.wxs.disruptor.stream.StreamTest.main(StreamTest.java:15)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 15
    at java.util.ArrayList.add(ArrayList.java:463)
    at java.util.stream.ForEachOps$ForEachOp$OfInt.accept(ForEachOps.java:205)
    at java.util.stream.IntPipeline$3$1.accept(IntPipeline.java:233)
    at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
    at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
    at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.execLocalTasks(ForkJoinPool.java:1040)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1058)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

3. Analysis

The cause of the problem is also very simple. Parallel Stream uses the ForkJoinPool thread pool for execution internally, which means there is a thread safety issue, and ArrayList is thread unsafe. The following will analyze the causes of various abnormal situations in turn.

3.1 The number of elements is lost

// java.util.ArrayList#add(E)
public boolean add(E e) {
  ensureCapacityInternal(size + 1);  // Increments modCount!!
  elementData[size++] = e;
  return true;
}

The reason why the array subscript is out of bounds is elementData[size++] = e in the add() method of ArrayList. This line of code is not an atomic operation and can be disassembled as:

Read the size value
and add e to the position of size, that is, elementData[size] = e
size++
. There is a memory visibility problem here. When thread A reads size from memory, it sets the e value, adds size by 1, and then writes it to the memory. . During the process, thread B may also modify the size and write it to the memory. Then the value written by thread A to the memory will lose the update of thread B. This explains the situation where the array length is smaller than the original array (elements are missing).

3.2 null elements

The generation of null elements is similar to the loss of element data, and is also caused by the fact that elementData[size++] = e is not an atomic operation. Suppose there are three threads, thread 1, thread 2, and thread 3. Three threads start executing at the same time, and the initial size value is 1.

Thread 1 is all executed, and size is updated to 2.

Thread 2 starts reading the size value = 1. After adding e to the size position, the time slice is used up. It is the third step of size++ to read the update of thread 1, and size is directly updated to 3. [Note: The e value of thread 2 is also lost here and is overwritten by thread 1]

Thread 3 starts to read the size value = 1 and the time slice is used up. It is the second step to add e to the size position and read the update of thread 2, and the size becomes 3. The position of size = 2 is skipped, so elementData[2] is null.


3.3 Array subscript out of bounds

Array out-of-bounds exceptions mainly occur at critical points before array expansion. Assume that the current array can only add one element, and two threads prepare to execute ensureCapacityInternal(size + 1) at the same time. The size value read at the same time, adding 1 and entering ensureCapacityInternal will not cause expansion.

After exiting ensureCapacityInternal, both threads execute elementData[size] = e at the same time. Thread B's size++ is completed first. Assume that thread A reads the update of thread B at this time. Thread A then executes size++. At this time, the actual value of size will be greater than The capacity of the array, so an array out-of-bounds exception will occur.

4. Solve

Solving the problem is also very simple. There are two types. One is to make the result set thread-safe.

List<Integer> list = new CopyOnWriteArrayList<>();
// or
List<Integer> list = Collections.synchronizedList(new ArrayList<>());


 

Guess you like

Origin blog.csdn.net/qq_38623939/article/details/131379010