Java does not use all available CPUs

BKE :

I have a long running calculation which I need to carry out for a long list of inputs. The calculations are independent, so I would like to distribute them to several CPUs. I am using Java 8.

The skeleton of the code looks like this:

ExecutorService executorService = Executors.newFixedThreadPool(numThreads);

MyService myService = new MyService(executorService);

List<MyResult> results =
            myInputList.stream()
                     .map(myService::getResultFuture)
                     .map(CompletableFuture::join)
                     .collect(Collectors.toList());

executorService.shutdown();

The main function responsible the calculation looks like this:

CompletableFuture<MyResult> getResultFuture(MyInput input) {
    return CompletableFuture.supplyAsync(() -> longCalc(input), executor)))
}

The long running calculation is stateless and does not do any IO.

I would expect this code to use all available CPUs, but it does not happen. For example, on a machine with 72 CPUs and numThreads=72 (or even eg. numThreads=500), cpu usage is at most 500-1000%, as shown by htop:

htop

According to the thread dump, many of the calculation threads are waiting, ie.:

"pool-1-thread-34" #55 prio=5 os_prio=0 tid=0x00007fe858597890 nid=0xd66 waiting on condition [0x00007fe7f9cdd000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x0000000381815f20> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
    - None

All calculation threads were waiting for the same lock. At the time of the dump, only 5 calculation threads were RUNNABLE, the rest were WAITING.

What can be the reason for the locks and why do I not manage to use all cpus?

Holger :

You are submitting jobs and calling join() right afterwards, waiting for the completion of the asynchronous job.

Stream intermediate steps are executed element-wise, which means that the intermediate step .map(CompletableFuture::join) runs on one element at a time (even worse as it's a sequential stream), without making sure all elements have gone through the submission step. This causes the thread to block while waiting for the completion of each single calculation.

You have to enforce a submission of all jobs before starting to call join() on them:

List<MyResult> results =
    myInputList.stream()
               .map(myService::getResultFuture)
               .collect(Collectors.toList()).stream()
               .map(CompletableFuture::join)
               .collect(Collectors.toList());

If you can express whatever you want to do with the results list as an action to be invoked when everything is done, you can implement the operation in a way that does not block threads with join():

List<CompletableFuture<MyResult>> futures = myInputList.stream()
    .map(myService::getResultFuture)
    .collect(Collectors.toList());
CompletableFuture.allOf(futures.toArray(CompletableFuture<?>[]::new))
    .thenRun(() -> {
        List<MyResult> results = futures.stream()
            .map(CompletableFuture::join)
            .collect(Collectors.toList());
        // perform action with results
    });

It still calls join() to retrieve the result, but at this point, all futures have been completed so the caller won’t get blocked.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=433993&siteId=1