Programmer's Notes | 4 issues to be aware of when writing high-performance Java code

1. Concurrency

Unable to create new native thread ……

Question 1: How much memory is consumed to create a thread in Java?

Each thread has its own stack memory and shared heap memory

Question 2: How many threads can a machine create?

CPU, Memory, Operating System, JVM, Application Server

Let's write a sample code to verify the difference between thread pools and non-thread pools:

//线程池和非线程池的区别
public class ThreadPool {
  
     public static int times = 100;//100,1000,10000
  
     public static ArrayBlockingQueue arrayWorkQueue = new ArrayBlockingQueue(1000);
     public static ExecutorService threadPool = new ThreadPoolExecutor(5, //corePoolSize线程池中核心线程数
             10,
             60,
             TimeUnit.SECONDS,
             arrayWorkQueue,
             new ThreadPoolExecutor.DiscardOldestPolicy()
     );
  
     public static void useThreadPool() {
         Long start = System.currentTimeMillis();
         for (int i = 0; i < times; i++) {
             threadPool.execute(new Runnable() {
                 public void run() {
                     System.out.println("说点什么吧...");
                 }
             });
         }
         threadPool.shutdown();
         while (true) {
             if (threadPool.isTerminated()) {
                 Long end = System.currentTimeMillis();
                 System.out.println(end - start);
                 break;
             }
         }
     }
  
     public static void createNewThread() {
         Long start = System.currentTimeMillis();
         for (int i = 0; i < times; i++) {
  
             new Thread() {
                 public void run() {
                     System.out.println("说点什么吧...");
                 }
             }.start();
         }
         Long end = System.currentTimeMillis();
         System.out.println(end - start);
     }
  
     public static void main(String args[]) {
         createNewThread();
         //useThreadPool();
     }
 }

Start different numbers of threads, and then compare the execution results of the thread pool and non-thread pool:

	non-threaded pool	Thread Pool
100 times	16 ms	5ms
1000 times	90 ms	28ms
10000 times	1329ms	164ms

Conclusion: Don't use new Thread(), use thread pool

Disadvantages of non-thread pools:

High performance consumption per creation
Disorder and lack of management. Easily create unlimited threads, causing OOM and crashes

1.1 Issues to be aware of when using thread pools

To avoid deadlock, try to use CAS

Let's write an implementation example of optimistic locking:

public class CASLock {
  
     public static int money = 2000;
  
     public static boolean add2(int oldm, int newm) {
         try {
             Thread.sleep(2000);
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         if (money == oldm) {
             money = money + newm;
             return true;
         }
         return false;
     }
  
     public synchronized static void add1(int newm) {
         try {
             Thread.sleep(3000);
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         money = money + newm;
     }
  
     public static void add(int newm) {
         try {
             Thread.sleep(3000);
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         money = money + newm;
     }
  
     public static void main(String args[]) {
         Thread one = new Thread() {
             public void run() {
                 //add(5000)
                 while (true) {
                     if (add2(money, 5000)) {
                         break;
                     }
                 }
             }
         };
         Thread two = new Thread() {
             public void run() {
                 //add(7000)
                 while (true) {
                     if (add2(money, 7000)) {
                         break;
                     }
                 }
             }
         };
         one.start();
         two.start();
         try {
             one.join();
             two.join();
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         System.out.println(money);
     }
 }

Be careful when using ThreadLocal

ThreadLocalMap uses the weak reference of ThreadLocal as the key. If a ThreadLocal has no external strong reference to refer to it, then the ThreadLocal will be recycled when the system is GC. In this way, an Entry with a key of null will appear in ThreadLocalMap, and there will be no entry. There is a way to access the value of the Entry whose key is null. If the current thread is delayed, there will always be a strong reference chain for the value of the Entry whose key is null: Thread Ref -> Thread -> ThreaLocalMap -> Entry - > value can never be reclaimed, causing a memory leak.

Let's write an example of the correct use of ThreadLocalMap:

//ThreadLocal应用实例
public class ThreadLocalApp {
  
     public static final ThreadLocal threadLocal = new ThreadLocal();
  
     public static void muti2() {
         int i[] = (int[]) threadLocal.get();
         i[1] = i[0] * 2;
         threadLocal.set(i);
     }
  
     public static void muti3() {
         int i[] = (int[]) threadLocal.get();
         i[2] = i[1] * 3;
         threadLocal.set(i);
     }
  
     public static void muti5() {
         int i[] = (int[]) threadLocal.get();
         i[3] = i[2] * 5;
         threadLocal.set(i);
     }
  
     public static void main(String args[]) {
         for (int i = 0; i < 5; i++) {
             new Thread() {
                 public void run() {
                     int start = new Random().nextInt(10);
                     int end[] = {0, 0, 0, 0};
                     end[0] = start;
                     threadLocal.set(end);
                     ThreadLocalApp.muti2();
                     ThreadLocalApp.muti3();
                     ThreadLocalApp.muti5();
                     //int end = (int) threadLocal.get();
                     System.out.println(end[0] + "  " + end[1] + "  " + end[2] + "  " + end[3]);
                     threadLocal.remove();
                 }
             }.start();
         }
     }
 }

1.2 Thread interaction - problems caused by thread insecurity

The classic HashMap infinite loop causes 100% CPU problem

We simulate an example of a HashMap infinite loop:

//HashMap死循环示例
public class HashMapDeadLoop {
  
     private HashMap hash = new HashMap();
  
     public HashMapDeadLoop() {
         Thread t1 = new Thread() {
             public void run() {
                 for (int i = 0; i < 100000; i++) {
                     hash.put(new Integer(i), i);
                 }
                 System.out.println("t1 over");
             }
         };
  
         Thread t2 = new Thread() {
             public void run() {
                 for (int i = 0; i < 100000; i++) {
                     hash.put(new Integer(i), i);
                 }
                 System.out.println("t2 over");
             }
         };
         t1.start();
         t2.start();
     }
  
     public static void main(String[] args) {
         for (int i = 0; i < 1000; i++) {
             new HashMapDeadLoop();
         }
         System.out.println("end");
     }
 }
https://coolshell.cn/articles/9606.html

After the HashMap infinite loop occurs, we can observe the following information in the thread stack:

/HashMap死循环产生的线程栈
Thread-281" #291 prio=5 os_prio=31 tid=0x00007f9f5f8de000 nid=0x5a37 runnable [0x0000700006349000]
   java.lang.Thread.State: RUNNABLE
       at java.util.HashMap$TreeNode.split(HashMap.java:2134)
       at java.util.HashMap.resize(HashMap.java:713)
       at java.util.HashMap.putVal(HashMap.java:662)
       at java.util.HashMap.put(HashMap.java:611)
       at com.example.demo.HashMapDeadLoop$2.run(HashMapDeadLoop.java:26)

Application stagnant deadlock, Spring 3.1 deadlock problem

An example where we simulate a deadlock:

//死锁的示例
public class DeadLock {
     public static Integer i1 = 2000;
     public static Integer i2 = 3000;
         public static synchronized Integer getI2() {
         try {
             Thread.sleep(3000);
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         return i2;
     }
     public static void main(String args[]) {
         Thread one = new Thread() {
             public void run() {
                 synchronized (i1) {
                     try {
                         Thread.sleep(3000);
                     } catch (InterruptedException e) {
                         e.printStackTrace();
                     }
                     synchronized (i2) {
                         System.out.println(i1 + i2);
                     }
                 }
             }
         };
         one.start();
         Thread two = new Thread() {
             public void run() {
                 synchronized (i2) {
                     try {
                         Thread.sleep(3000);
                     } catch (InterruptedException e) {
                         e.printStackTrace();
                     }
                     synchronized (i1) {
                         System.out.println(i1 + i2);
                     }
                 }
             }
         };
         two.start();
     }
 }

After the deadlock occurs, we can observe the following information in the thread stack:

//死锁时产生堆栈
"Thread-1":
       at com.example.demo.DeadLock$2.run(DeadLock.java:47)
       - waiting to lock  (a java.lang.Integer)
       - locked  (a java.lang.Integer)
"Thread-0":
       at com.example.demo.DeadLock$1.run(DeadLock.java:31)
       - waiting to lock  (a java.lang.Integer)
       - locked  (a java.lang.Integer)
Found 1 deadlock.

1.3 JUC-based optimization example

For the optimization of a counter, we use Synchronized, ReentrantLock, and Atomic three different ways to implement a counter and experience the performance difference.

//示例代码
public class SynchronizedTest {
  
     public static int threadNum = 100;
     public static int loopTimes = 10000000;
  
     public static void userSyn() {
         //线程数
         Syn syn = new Syn();
         Thread[] threads = new Thread[threadNum];
         //记录运行时间
         long l = System.currentTimeMillis();
         for (int i = 0; i < threadNum; i++) {
             threads[i] = new Thread(new Runnable() {
                 @Override
                 public void run() {
                     for (int j = 0; j < loopTimes; j++) {
                         //syn.increaseLock();
                         syn.increase();
                     }
                 }
             });
             threads[i].start();
         }
         //等待所有线程结束
         try {
             for (int i = 0; i < threadNum; i++)
                 threads[i].join();
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         System.out.println("userSyn" + "-" + syn + " : " + (System.currentTimeMillis() - l) + "ms");
     }
  
     public static void useRea() {
         //线程数
         Syn syn = new Syn();
         Thread[] threads = new Thread[threadNum];
         //记录运行时间
         long l = System.currentTimeMillis();
         for (int i = 0; i < threadNum; i++) {
             threads[i] = new Thread(new Runnable() {
                 @Override
                 public void run() {
                     for (int j = 0; j < loopTimes; j++) {
                         syn.increaseLock();
                         //syn.increase();
                     }
                 }
             });
             threads[i].start();
         }
         //等待所有线程结束
         try {
             for (int i = 0; i < threadNum; i++)
                 threads[i].join();
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         System.out.println("userRea" + "-" + syn + " : " + (System.currentTimeMillis() - l) + "ms");
     }
    public static void useAto() {
         //线程数
         Thread[] threads = new Thread[threadNum];
         //记录运行时间
         long l = System.currentTimeMillis();
         for (int i = 0; i < threadNum; i++) {
             threads[i] = new Thread(new Runnable() {
                 @Override
                 public void run() {
                     for (int j = 0; j < loopTimes; j++) {
                         Syn.ai.incrementAndGet();
                     }
                 }
             });
             threads[i].start();
         }
         //等待所有线程结束
         try {
             for (int i = 0; i < threadNum; i++)
                 threads[i].join();
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         System.out.println("userAto" + "-" + Syn.ai + " : " + (System.currentTimeMillis() - l) + "ms");
     }
  
     public static void main(String[] args) {
         SynchronizedTest.userSyn();
         SynchronizedTest.useRea();
         SynchronizedTest.useAto();
     }
 }
  
 class Syn {
     private int count = 0;
     public final static AtomicInteger ai = new AtomicInteger(0);
  
     private Lock lock = new ReentrantLock();
  
     public synchronized void increase() {
         count++;
     }
  
     public void increaseLock() {
         lock.lock();
         count++;
         lock.unlock();
     }
  
     @Override
     public String toString() {
         return String.valueOf(count);
     }
 }

In conclusion, in the case of high concurrency and large number of loops, the efficiency of reentrant locks is higher than that of Synchronized, but Atomic has the best performance in the end.

Second, communication

2.1 Efficient problem of database connection pool

Be sure to close the connection in finally
Be sure to release the connection in finally

2.2 OIO / NIO / AIO

	OIO	NIO	AIO
Types of	block	non-blocking	non-blocking
Difficulty to use	Simple	complex	complex
reliability	Difference	high	high
Throughput	Low	high	high

Conclusion: When I have strict performance requirements, I should use NIO for communication as much as possible.

2.3 TIME_WAIT(client)，CLOSE_WAIT(server)问题

Reaction: Frequent request failures

Get connection status netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'

TIME_WAIT: Indicates active shutdown, and the system kernel parameters can be optimized.
CLOSE_WAIT: Indicates passive shutdown.
ESTABLISHED: Indicates that communication is in progress

Solution: Forced shutdown after the second stage is completed

2.4 Serial connections, persistent connections (long connections), piped connections

in conclusion:

The performance of the pipe connection is the best, and the persistence reduces the time of opening/closing the connection on the basis of the serial connection.

Pipeline connection usage restrictions:

1. The HTTP client cannot confirm persistence (usually server-to-server, non-terminal use);

2. The order of the response information must be consistent with the order of the request information;

3. You must support idempotent operations before you can use piped connections.

3. Database operation

There must be an index (pay special attention to querying by time)

Single operation or batch operation

Note: Many programmers arbitrarily use a single operation method when writing code, but under the premise of performance requirements, a batch operation method is required.

4. JVM

4.1 General processing steps for CPU elevation

Top find out which process consumes the most CPU
top -H -p to find out which thread consumes the most CPU
Record the threads that consume the most CPU
printf %x converts pid to base
jstack records the stack information of the process
Find out the thread information that consumes the most CPU

4.2 General processing steps of memory level (OOM)

The jstat command checks the number of occurrences of FGC and the time it takes.
Continuously check jmap –heap to check the occupancy of the old generation, the larger the change, the more problematic the program;
Use successive jmap –histo:live commands to export the file and compare the differences of the loaded objects. The difference is usually where the problem occurs.

4.3 Single core elevation caused by GC

If the single CPU usage is high, start with the GC first.

4.4 Common SY elevations

Frequent thread context switching
too many threads
Lock competition is fierce

4.5 Iowait elevation

If the CPU usage of IO is high, troubleshoot programs that involve IO, such as transforming OIO into NIO.

4.6 Jitter problem

Reason: The conversion of bytecode to machine code requires CPU time slices. When a large number of CPUs execute bytecodes, the CPU is in a high position for a long time;

Phenomenon: "C2 CompilerThread1" daemon, "C2 CompilerThread0" daemon has the highest CPU usage;

Solution: Ensure the CPU ratio of the compilation thread.

Author: Liang Xin

Source: CreditEase Institute of Technology