[Java multithreading] On the development of parallel processing tasks in Java

I. Introduction

Parallel , 多个线程一起运行,来提高系统的整体处理速度namely: .

  • Why use multiple threads can improve processing speed, because now the computer generally are 多核处理器, we need to charge 分利用cpu资源; if the station is a little higher, we 每台机器都可以是一个处理节点, 多台机器并行处理. Parallel processing can be said to be everywhere.

This article mainly talks about the way that Java implements parallel processing

2. Parallel everywhere

Java's garbage collector, we can see that every generation version of the update, along with the GC shorter delays, from serialthe cmsnow and then G1, Java has been slow to take off the hat. From the early message queue ActiveMQto the present kafkaand RocketMQthe introduction of 分区the concept, to improve the parallelism of the message. After the data of a single table in the database reaches a certain level, the access speed will be very slow. We will 分表process and import the table 数据库中间件; you may think that the processing of Redis itself is single-threaded, but slot(槽)the concept introduced in the Redis cluster solution ; more general It is many of our business systems, which usually deploy multiple units and 负载均衡中间件distribute them through them; well, there are some other examples, but I will not list them all here.

Java Garbage Collector-Serial, Parallel, CMS, G1 Collector Overview
JVM Garbage Collector-Comparison of Serial, Parallel, CMS and G1

3. How to Parallel

I think the parallel is the core of that "拆分", , 把大任务变成小任务,然后利用多核CPU也好,还是多节点也好,同时并行的处理the ancient version of the Java update, are processed in parallel to provide more convenient for our developers, from the beginning Thread, to 线程池, to fork/join框架, to finally deal

Let's use a simple summing example to see how various methods are processed in parallel;

3.1. Single-threaded processing

First look at the simplest single-threaded processing method, directly use the main thread for summing operations;

public class SingleThread {
    
    
    public static void main(String[] args) {
    
    
    	//生成指定范围大小的的数组
        long[] numbers = LongStream.rangeClosed(1, 10_000_000).toArray();
        long sum = 0;
        for (int i = 0; i < numbers.length; i++) {
    
    
            sum += numbers[i];
        }
        System.out.println("sum  = " + sum);
    }
}

Sum itself is a computationally intensive task , but now 多核时代, only 单线程, which is equivalent to only use 一个cpu, 其他cpu被闲置,导致资源的浪费.

3.2.Thread method

We put 任务拆分成多个小任务,然后每个小任务分别启动一个线程,分段处理任务. As follows:

public class ThreadTest {
    
    
    //分段阈值,即每个线程处理次数
    public static final int threshold = 10_000;
    //要累加的数字集合
    public static long[] numbers;
    //累加结果
    private static long allSum;

    public static void main(String[] args) throws Exception {
    
    
        //生成要累加的数字集合
        numbers = LongStream.rangeClosed(1, 10_000_000).toArray();

        //线程数 =计算总次数 / 每个线程处理次数
        int taskSize = (int) (numbers.length / threshold);

        //循环生成线程
        for (int i = 1; i <= taskSize; i++) {
    
    
            final int key = i;
            new Thread(new Runnable() {
    
    
                public void run() {
    
    
                    //一个线程处理数组的一段数据  start= (i - * threshold) ,end = key * threshold,类似于分页计算公式
                    sumAll(segmentSum((key - 1) * threshold, key * threshold));
                }
            }).start();
        }

        Thread.sleep(100);
        System.out.println("allSum = " + getAllSum());
    }

    //累加每个线程计算的总和
    private static synchronized long sumAll(long threadSum) {
    
    
        return allSum += threadSum;
    }

    //获取总和
    public static synchronized long getAllSum() {
    
    
        return allSum;
    }

    /**
     * 分段累加
     * @param start 开始下标
     * @param end   结束下标
     * @return
     */
    private static long segmentSum(int start, int end) {
    
    
        long sum = 0;
        for (int i = start; i < end; i++) {
    
    
            sum += numbers[i];
        }
        return sum;
    }
}

In the above section, a large task is divided into small tasks. Then through the segmentation threshold, the number of threads to be generated and the number of tasks to be processed in each segment are calculated . This treatment is 创建的线程数过多,而CPU数有限, more importantly 求和是一个计算密集型任务, 启动过多的线程只会带来更多的线程上下文切换. At the same time 线程处理完一个任务就终止了,也是对资源的浪费. In addition, it can be seen that the main thread does not know when the subtask has been processed, and additional processing is required. So Java subsequently introduced thread pools .

3.3. Thread pool mode

Introduced in Java 1.5并发包java.concurrent , including the thread pool ThreadPoolExecutor, the relevant code is as follows:

public class ExecutorServiceTest {
    
    
    //分段阈值,即每个线程处理次数
    public static final int threshold = 10_000;
    //要累加的数字集合(即)
    public static long[] numbers;

    public static void main(String[] args) throws Exception {
    
    
        //生成要累加的数字集合
        numbers = LongStream.rangeClosed(1, 10_000_000).toArray();

        //创建固定长度的线程池,核心线程数大于与非核心线程大小相等=cpu核心数+1
        ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() + 1);

        //CompletionService实际上可以看做是Executor和BlockingQueue的结合体。CompletionService在接收到要执行的任务时,通过类似BlockingQueue的put和take获得任务执行的结果。CompletionService的一个实现是ExecutorCompletionService,
        CompletionService<Long> completionService = new ExecutorCompletionService<Long>(executor);

        //线程数 =计算总次数 / 每个线程处理次数
        int taskSize = numbers.length / threshold;

        //循环生成线程
        for (int i = 1; i <= taskSize; i++) {
    
    
            final int key = i;
            completionService.submit(new Callable<Long>() {
    
    
                @Override
                public Long call() throws Exception {
    
    
                    //一个线程处理数组的一段数据  start= (i - * threshold) ,end = key * threshold,类似于分页计算公式
                    return segmentSum((key - 1) * threshold, key * threshold);
                }
            });
        }

        long sumValue = 0;
        for (int i = 0; i < taskSize; i++) {
    
    
            //检索并移除表示下一个已完成任务的 Future,如果目前不存在这样的任务,则等待。
            sumValue += completionService.take().get();
        }

        // 所有任务已经完成,关闭线程池
        System.out.println("sumValue = " + sumValue);
        executor.shutdown();
    }

    /**
     * 分段累加
     * @param start 开始下标
     * @param end   结束下标
     * @return
     */
    private static long segmentSum(int start, int end) {
    
    
        long sum = 0;
        for (int i = start; i < end; i++) {
    
    
            sum += numbers[i];
        }
        return sum;
    }
}

The 计算密集型business has been analyzed above , 并不是线程越多越好and it is created here JDK默认的线程数:CPU数+1. This is a result given after a lot of testing; the thread pool, as the name implies,可以重复利用现有的线程

  • While taking advantage of CompletionServiceto对子任务进行汇总
  • Reasonable use of thread pool can already fully parallel processing tasks, but the writing is a bit cumbersome, at this time Java1.7 is introduced fork/join框架;

3.4.fork/join framework

The purpose of the branch/merger framework is:; The 以递归的方式将可以并行的任务拆分成更小的任务,然后将每个子任务的结果合并起来生成整体结果relevant code is as follows:

public class ForkJoinTest extends RecursiveTask<Long> {
    
    
    //分段阈值,即每个线程处理次数
    public static final int threshold = 10_000;
    //要累加的数字集合(即)
    private final long[] numbers;
    //当前任务集合开始下标
    private final int start;
    //当前任务集合结束下标
    private final int end;

    //构造方法(初始化要累加的数字集合,开始下标,结束下标)
    private ForkJoinTest(long[] numbers, int start, int end) {
    
    
        this.numbers = numbers;
        this.start = start;
        this.end = end;
    }

    public static void main(String[] args) {
    
    
        //要累加的数字集合(即)
        long[] numbers = LongStream.rangeClosed(1, 10_000_000).toArray();


        // 创建包含Runtime.getRuntime().availableProcessors()返回值作为个数的并行线程的ForkJoinPool
        ForkJoinPool forkJoinPool = new ForkJoinPool();

        // 提交可分解的PrintTask任务
        //Future<Long> future = forkJoinPool.submit(new ForkJoinTest(numbers, 0, numbers.length));
        //System.out.println("计算出来的总和="+future.get());

        //创建ForkJoin 任务
        ForkJoinTask<Long> task = new ForkJoinTest(numbers,0, numbers.length);
        Long sumAll = forkJoinPool.invoke(task);
        System.out.println("计算出来的总和=" + sumAll);

        // 关闭线程池
        forkJoinPool.shutdown();
    }

    @Override
    protected Long compute() {
    
    
        //总处理次数
        int length = end - start;
        // 当end-start的值小于threshold时候,直接累加
        if (length <= threshold) {
    
    
            long sum = 0;
            for (int i = start; i < end; i++) {
    
    
                sum += numbers[i];
            }
            return sum;
        }
        
        System.err.println("=====任务分解======");
        // 将大任务从中间切分,然后分解成两个小任务
        int middle = (start + end) / 2;

        //任务分解: 将大任务分解成两个小任务
        ForkJoinTest leftTask = new ForkJoinTest(numbers, start, middle);
        ForkJoinTest rightTask = new ForkJoinTest(numbers, middle, end);

        // 并行执行两个小任务
        leftTask.fork();
        rightTask.fork();

        // 注:join方法会阻塞,因此有必要在两个子任务的计算都开始之后才执行join方法
        // 把两个小任务累加的结果合并起来
        return leftTask.join() + rightTask.join();
    }
}

Results of the:
Insert picture description here

ForkJoinPoolIt is an implementation of the ExecutorService interface . Subtasks are assigned to worker threads in the thread pool. At the same time, tasks need to be submitted to this thread pool. RecursiveTask<R>A subclass needs to be created .

  • The general logic is 通过fork(0进行拆分, then 通过join()进行结果的合并, Java provides us with a framework, we only need to fill in it, which is more convenient
  • Is there a simpler way to save the split and automatically split and merge the concept Java1.8introduced in ;

3.5. Parallel stream mode

Java 8 introduced the concept of stream, which allows us to make better use of parallelism. The code for using stream is as follows:

public class StreamTest {
    
    
    public static void main(String[] args) {
    
    
        // 并行流:多个线程同时运行
        System.out.println("sum = " + parallelRangedSum(10_000_000));
        // 顺序流:使用主线程,单线程
        System.out.println("sum = " + sequentialRangedSum(10_000_000));
    }
    
    //并行流
    public static long parallelRangedSum(long n) {
    
    
        return LongStream.rangeClosed(1, n).parallel().reduce(0L, Long::sum);
    }
    //顺序流
    public static long sequentialRangedSum(long n) {
    
    
        return LongStream.rangeClosed(1, n).sequential().reduce(0L, Long::sum);
    }
}

Is the above code very simple? For developers, it is complete 不需要手动拆分,使用同步机制等方式, and tasks can be processed in parallel. You only need to use parallel()methods for streams 系统自动会对任务进行拆分. Of course, the premise is that there is no shared mutable state.

  • The internal use of the parallel stream is also the fork/join framework

Conclusion
This article uses an example of summation to introduce parallel processing. You can see that Java has been working hard to provide more convenient parallel processing.

Guess you like

Origin blog.csdn.net/qq877728715/article/details/113987179