Runtime.availableProcessors() 分析

A recent article Docker Java will no longer face embarrassing: Java 10 to do a special optimized for Docker , which refers to java10 for docker do some special optimization. As we all know the docker container of java support it has been relatively awkward, because the underlying docker to use cgroups process level isolation, although we set the container resource limits by docker, but jvm virtual machine is actually not perceive here a bit. For example, our host might be 8-core 16G, limited docker container 2 nuclear 4G, read out in the container resources may or eight-core 16G, we usually could read machine resources to do performance tuning, such as the core number of threads , set the maximum number of threads. For some this program is concerned, on the docker run may lead to performance loss, fortunately java10 has increased this support, and there jdk8 compatible program.

Think of the recent work found in the course of the optimization program availableProcessorsseems to have a large performance loss, so it carried out a detailed understanding and do some testing.

availableProcessors provides what function?

/**
     * Returns the number of processors available to the Java virtual machine.
     *
     * <p> This value may change during a particular invocation of the virtual
     * machine.  Applications that are sensitive to the number of available
     * processors should therefore occasionally poll this property and adjust
     * their resource usage appropriately. </p>
     *
     * @return  the maximum number of processors available to the virtual
     *          machine; never smaller than one
     * @since 1.4
     */
    public native int availableProcessors();
复制代码

jdk document is written so, returns the number of cores available jvm virtual machine . And after that there are some comments: This value is likely to change during a particular invocation of the virtual machine . We usually visual impression for this function is: Returns the number of machine CPU, this should be a constant value . Seen in this light, there may be some big misunderstanding. Thus I had two questions:

  • 1, where the number of available cores for the JVM?
  • 2, why the return value of the variable? How does it work?

JVM available number of cores

This is better understood, as the name suggests can be used for the JVM CPU cores to work use. On a multi-core CPU server, you may be installed a plurality of applications, wherein only a part of the JVM, some cpu been used in other applications.

Why is the return value of the variable? How does it work?

The return value of this variable is relatively easy to understand, since multiple applications on multi-core CPU utility cpu server, for different moments in terms of the number of JVM use may be different, of course, in that case, how is that java to do it? Jdk8 by reading the source code, to achieve different linux system and windows system is still relatively large.

linux 实现
int os::active_processor_count() {
  // Linux doesn't yet have a (official) notion of processor sets,
  // so just return the number of online processors.
  int online_cpus = ::sysconf(_SC_NPROCESSORS_ONLN);
  assert(online_cpus > 0 && online_cpus <= processor_count(), "sanity check");
  return online_cpus;
}
复制代码

linux implement lazy, direct reading system parameters sysconf, _SC_NPROCESSORS_ONLN.

windows 实现
int os::active_processor_count() {
  DWORD_PTR lpProcessAffinityMask = 0;
  DWORD_PTR lpSystemAffinityMask = 0;
  int proc_count = processor_count();
  if (proc_count <= sizeof(UINT_PTR) * BitsPerByte &&
      GetProcessAffinityMask(GetCurrentProcess(), &lpProcessAffinityMask, &lpSystemAffinityMask)) {
    // Nof active processors is number of bits in process affinity mask
    int bitcount = 0;
    while (lpProcessAffinityMask != 0) {
      lpProcessAffinityMask = lpProcessAffinityMask & (lpProcessAffinityMask-1);
      bitcount++;
    }
    return bitcount;
  } else {
    return proc_count;
  }
}
复制代码

windows system implementation is more complex, you can see not only need to determine whether the CPU is available, but also according to the CPU affinity to judge whether the thread is available to the CPU. Inside a while loop to parse through CPU affinity mask, so this is a CPU-intensive operations.

Performance Testing

With the above analysis, we can know that the basic operation is a cpu-sensitive operation, its performance under various operating system how the performance of it? I tested this function follows a number of performance under normal working at full capacity and where cpu is working. Test data to perform call a million times, 10 times performance statistics, and averaged. Related code is as follows:

public class RuntimeDemo {

    private static final int EXEC_TIMES = 100_0000;
    private static final int TEST_TIME = 10;

    public static void main(String[] args) throws Exception{
        int[] arr = new int[TEST_TIME];
        for(int i = 0; i < TEST_TIME; i++){
            long start = System.currentTimeMillis();
            for(int j = 0; j < EXEC_TIMES; j++){
                Runtime.getRuntime().availableProcessors();
            }
            long end = System.currentTimeMillis();
            arr[i] = (int)(end-start);
        }

        double avg = Arrays.stream(arr).average().orElse(0);
        System.out.println("avg spend time:" + avg + "ms");

    }
}
复制代码

CPU full load code is as follows:

public class CpuIntesive {

    private static final int THREAD_COUNT = 16;

    public static void main(String[] args) {
        for(int i = 0; i < THREAD_COUNT; i++){
            new Thread(()->{
                long count = 1000_0000_0000L;
                long index=0;
                long sum = 0;
                while(index < count){
                    sum = sum + index;
                    index++;
                }
            }).start();
        }
    }
}
复制代码
system Configuration testing method Test Results
Windows 2 nuclear 8G normal 1425.2ms
Windows 2 nuclear 8G CPU full load 6113.1ms
MacOS 4-core 8G normal 69.4ms
MacOS 4-core 8G CPU full load 322.8ms

Although two quite different machine configurations, test data comparison of little significance, but the test case can still draw the following conclusions:

  • windows and linux-based system performance are quite different, and is implementation dependent
  • CPU-intensive calculation have much effect on the performance of the function
  • Overall speaking, the function performance is quite acceptable, that is the longest windows CPU under full load is only 6us. Linux system can be reduced to ns level.

to sum up

  • Daily work, and less need to call attention to the performance of the function of load
  • To use static variables can be defined as a general, sensitivity program for the cpu is concerned, you can periodically get the value of a similar caching strategy
  • Performance issues at work may not lead to the function, possibly leading to other problems

thank

Guess you like

Origin juejin.im/post/5d7e3f93f265da03c23f04ce