Foreword
Today saw a face questions
Ten million number, how to efficiently sum?
See this problem in the "efficient summation", the first reaction thought JDK1.8 provides LongAdder
design class is segmented sum then aggregated . That is, open multiple threads, each responsible for a part of the calculation, so the threads are calculated after the completion of the summary. The whole process is as follows:
Ideas are there, then began a pleasant bar coding
test environment
- win10 system
- 4-core 4 thread CPU
- JDK1.8
- com.google.guava.guava-25.1-jre.jar
- chilli
Examples
Since the topic is no clear definition of what number to the number of ten million, so tentatively scheduled for int type of random numbers . In order to compare the efficiency of bloggers to achieve a single-threaded version and multi-threaded version to see how efficient multi-threading in the end.
Single-threaded version
Threaded accumulated ten million number, the code is relatively simple, direct analysis
/**
* 单线程的方式累加
* @param arr 一千万个随机数
*/
public static int singleThreadSum(int[] arr) {
long start = System.currentTimeMillis();
int sum = 0;
int length = arr.length;
for (int i = 0; i < length; i++) {
sum += arr[i];
}
long end = System.currentTimeMillis();
log.info("单线程方式计算结果:{}, 耗时:{} 秒", sum, (end - start) / 1000.0);
return sum;
}
Multithreaded version
Multi-threaded version comes to the thread pool (open multiple threads), CountDownLatch use (the main thread waits for the child thread execution is complete) and other tools, it is slightly more complicated.
// 每个task求和的规模
private static final int SIZE_PER_TASK = 200000;
// 线程池
private static ThreadPoolExecutor executor = null;
static {
// 核心线程数 CPU数量 + 1
int corePoolSize = Runtime.getRuntime().availableProcessors() + 1;
executor = new ThreadPoolExecutor(corePoolSize, corePoolSize, 3, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>());
}
/**
* 多线程的方式累加
*
* @param arr 一千万个随机数
* @throws InterruptedException
*/
public static int concurrencySum(int[] arr) throws InterruptedException {
long start = System.currentTimeMillis();
LongAdder sum = new LongAdder();
// 拆分任务
List<List<int[]>> taskList = Lists.partition(Arrays.asList(arr), SIZE_PER_TASK);
// 任务总数
final int taskSize = taskList.size();
final CountDownLatch latch = new CountDownLatch(taskSize);
for (int i = 0; i < taskSize; i++) {
int[] task = taskList.get(i).get(0);
executor.submit(() -> {
try {
for (int num : task) {
// 把每个task中的数字累加
sum.add(num);
}
} finally {
// task执行完成后,计数器减一
latch.countDown();
}
});
}
// 主线程等待所有子线程执行完成
latch.await();
long end = System.currentTimeMillis();
log.info("多线程方式计算结果:{}, 耗时:{} 秒", sum, (end - start) / 1000.0);
// 关闭线程池
executor.shutdown();
return sum.intValue();
}
Since the code has detailed notes, so I will not repeat them.
main method
The main method is relatively simple, mainly produced 10 million random number, then call the two methods can be.
// 求和的个数
private static final int SUM_COUNT = 10000000;
public static void main(String[] args) throws InterruptedException {
Random random = new Random();
int[] arr = new int[SUM_COUNT];
for (int i = 0; i < SUM_COUNT; i++) {
arr[i] = random.nextInt(200);
}
// 多线程版本
concurrencySum(arr);
// 单线程版本
singleThreadSum(arr);
}
8th line of code random.nextInt(200)
Why 200?
Because 1kw * 200 = 20 Yi <Integer.MAX_VALUE, so the accumulated result does not overflow
Finally to test the efficiency of the time, is the mule is a horse, pull out yo.
Confident, I clicked on the run, got the following results
22:13:31.068 [main] INFO com.sicimike.concurrency.EfficientSum - 多线程方式计算结果:995523090, 耗时:0.133 秒
22:13:31.079 [main] INFO com.sicimike.concurrency.EfficientSum - 单线程方式计算结果:995523090, 耗时:0.006 秒
I may be open the wrong way ...
but
After multiple runs , as well as adjust the thread pool parameters run multiple times after, always come to the operating results can not bear to look.
Multi-threaded runtime stable at around 0.130 seconds, running a single-threaded mode stable at around 0.006 seconds.
Multithreading improvements
Used in earlier versions of the multi-threaded LongAdder
class, because the LongAdder
class uses a lot of the underlying operating cas, the thread is very competitive, there will be reduced to varying degrees of efficiency. Therefore, when the development of the multi-threaded version of the embodiment, without using LongAdder
the class, but the embodiment is more suitable for the current scene.
/**
* 多线程的方式累加(改进版)
*
* @param arr 一千万个随机数
* @throws InterruptedException
*/
public static int concurrencySum(int[] arr) throws InterruptedException {
long start = System.currentTimeMillis();
int sum = 0;
// 拆分任务
List<List<int[]>> taskList = Lists.partition(Arrays.asList(arr), SIZE_PER_TASK);
// 任务总数
final int taskSize = taskList.size();
final CountDownLatch latch = new CountDownLatch(taskSize);
// 相当于LongAdder中的Cell[]
int[] result = new int[taskSize];
for (int i = 0; i < taskSize; i++) {
int[] task = taskList.get(i).get(0);
final int index = i;
executor.submit(() -> {
try {
for (int num : task) {
// 各个子线程分别执行累加操作
// result每一个单元就是一个task的累加结果
result[index] += num;
}
} finally {
latch.countDown();
}
});
}
// 等待所有子线程执行完成
latch.await();
for (int i : result) {
// 把子线程执行的结果累加起来就是最终的结果
sum += i;
}
long end = System.currentTimeMillis();
log.info("多线程方式计算结果:{}, 耗时:{} 秒", sum, (end - start) / 1000.0);
// 关闭线程池
executor.shutdown();
return sum;
}
Improved method of performing the following results were obtained:
22:46:05.085 [main] INFO com.sicimike.concurrency.EfficientSum - 多线程方式计算结果:994958790, 耗时:0.049 秒
22:46:05.094 [main] INFO com.sicimike.concurrency.EfficientSum - 单线程方式计算结果:994958790, 耗时:0.006 秒
Multiple runs , as well as adjust the thread pool parameters after multiple runs, results have stabilized.
Multi-threaded runtime stable at around 0.049 seconds, running a single-threaded mode stable at around 0.006 seconds
From 0.133 seconds to 0.049 seconds, about efficiency upgrade 170%
Think
Code is improved not only failed to solve a single thread Why faster than multi-threaded problem, but also one more question:
Why introduce an array of casual, even writing than Doug Lea
LongAdder
faster?
Because LongAdder
is a generic tool type, good balance of time and space relationships, so in a variety of scenarios can have better efficiency. The result array according to the present embodiment, the numbers are divided into ten million task how many, how much is the length of the array, the result of each task are present independent array entry, there is no competition, but takes up more space, so more time-efficient, that is, take the time space for thought.
As to why faster than single-threaded multi-threaded, it is not difficult to explain. Because no single thread context switch , plus accumulated relatively simple scenarios , each task execution time is very short, so the faster normal single-threaded.
stream mode
stream
Syntactic sugar JDK1.8 provided, and it is single-threaded. On stream
usage, we can understand their own. And the following are mainly used parallel stream
for comparison.
public static int streamSum(List<Integer> list) {
long start = System.currentTimeMillis();
int sum = list.stream().mapToInt(num -> num).sum();
long end = System.currentTimeMillis();
log.info("stream方式计算结果:{}, 耗时:{} 秒", sum, (end - start) / 1000.0);
return sum;
}
parallelStream way
parallelStream
See known name meaning, it is a parallel of stream
.
public static int parallelStreamSum(List<Integer> list) {
long start = System.currentTimeMillis();
int sum = list.parallelStream().mapToInt(num -> num).sum();
long end = System.currentTimeMillis();
log.info("parallel stream方式计算结果:{}, 耗时:{} 秒", sum, (end - start) / 1000.0);
return sum;
}
ForkJoin way
ForkJoin
The frame is raised for JDK1.7 split task calculation result calculated recombining frame.
When we need to perform a large number of small tasks, we experienced Java developers will thread pool to efficiently perform these tasks. However, there is a task, for example, more than 10 million element of the array to sort, such a task in itself can execute concurrently, but how broken down into smaller tasks that require dynamic split during task execution. In this way, you can split large tasks into small tasks, small tasks can continue down into smaller tasks, the last task results are summarized combined to give the final result, this model is the Fork / Join model.
ForkJoin
Using the frame is roughly divided into two parts: implement ForkJoin tasks, tasks
Achieve ForkJoin task
Custom class inherits RecursiveTask
(return value) or RecursiveAction
(no return value), implemented compute
method
/**
* 静态内部类的方式实现
* forkjoin任务
*/
static class SicForkJoinTask extends RecursiveTask<Integer> {
// 子任务计算区间开始
private Integer left;
// 子任务计算区间结束
private Integer right;
private int[] arr;
@Override
protected Integer compute() {
if (right - left < SIZE_PER_TASK) {
// 任务足够小时,直接计算
int sum = 0;
for (int i = left; i < right; i++) {
sum += arr[i];
}
return sum;
}
// 继续拆分任务
int middle = left + (right - left) / 2;
SicForkJoinTask leftTask = new SicForkJoinTask(arr, left, middle);
SicForkJoinTask rightTask = new SicForkJoinTask(arr, middle, right);
invokeAll(leftTask, rightTask);
Integer leftResult = leftTask.join();
Integer rightResult = rightTask.join();
return leftResult + rightResult;
}
public SicForkJoinTask(int[] arr, Integer left, Integer right) {
this.arr = arr;
this.left = left;
this.right = right;
}
}
Mission
By ForkJoinPool
the invoke
method of performing ForkJoin
the task
// ForkJoin线程池
private static final ForkJoinPool forkJoinPool = new ForkJoinPool();
public static int forkJoinSum(int[] arr) {
long start = System.currentTimeMillis();
// 执行ForkJoin任务
Integer sum = forkJoinPool.invoke(new SicForkJoinTask(arr, 0, SUM_COUNT));
long end = System.currentTimeMillis();
log.info("forkjoin方式计算结果:{}, 耗时:{} 秒", sum, (end - start) / 1000.0);
return sum;
}
main method
public static void main(String[] args) throws InterruptedException {
Random random = new Random();
int[] arr = new int[SUM_COUNT];
List<Integer> list = new ArrayList<>(SUM_COUNT);
int currNum = 0;
for (int i = 0; i < SUM_COUNT; i++) {
currNum = random.nextInt(200);
arr[i] = currNum;
list.add(currNum);
}
// 单线程执行
singleThreadSum(arr);
// Executor线程池执行
concurrencySum(arr);
// stream执行
streamSum(list);
// 并行stream执行
parallelStreamSum(list);
// forkjoin线程池执行
forkJoinSum(arr);
}
Results of the
23:19:21.207 [main] INFO com.sicimike.concurrency.EfficientSum - 单线程方式计算结果:994917205, 耗时:0.006 秒
23:19:21.274 [main] INFO com.sicimike.concurrency.EfficientSum - 多线程方式计算结果:994917205, 耗时:0.062 秒
23:19:21.292 [main] INFO com.sicimike.concurrency.EfficientSum - stream方式计算结果:994917205, 耗时:0.018 秒
23:19:21.309 [main] INFO com.sicimike.concurrency.EfficientSum - parallel stream方式计算结果:994917205, 耗时:0.017 秒
23:19:21.321 [main] INFO com.sicimike.concurrency.EfficientSum - forkjoin方式计算结果:994917205, 耗时:0.012 秒
Source
Code Address: EfficientSum.java
Interested students can download the source code after themselves, adjust various parameters of operation, the results do not necessarily like me.
to sum up
Code written a large version, the results of the initial problem still not resolved. Some might say: bloggers you pit father do.
Indeed, I did not think of a better way, but put a few questions to think clearly in the text should be more valuable than a face questions.
Which students if there is a better way to optimize, also please let me know.