One article to understand the JVM and performance optimization knowledge points that programmers must know

Table of contents

JVM and performance optimization

1. Java memory area

History of Virtual Machines

A look at the future of Java technology

runtime data area

  • The role of each region

    • program counter

      The line number indicator of the bytecode executed by the current thread, which occupies a small space and cannot be interfered

    • the stack

      Each thread is private, and when the thread is running, it will be packaged into a stack frame when executing each method, storing local variable table, operand stack, dynamic link, method exit and other information, and then put it into the stack. The current method being executed at each moment is the stack frame at the top of the virtual machine stack. The execution of the method corresponds to the process of pushing and popping the stack frame in the virtual machine stack.
      The stack frame size is 1M by default, and the size can be adjusted by the parameter –Xss, for example -Xss256k

    • heap

      Almost all objects are allocated here, which is also the main area where garbage collection occurs, which can be adjusted with the following parameters:
      -Xms: the minimum value of the heap;
      -Xmx: the maximum value of the heap;
      -Xmn: the size of the new generation;
      -XX:NewSize; The minimum value of the new generation;
      -XX:MaxNewSize: the maximum value of the new generation;
      for example - Xmx256m

    • method area

      It is used to store class information loaded by the virtual machine, constants ("zdy", "123", etc.), static variables (static variables) and other data, which can be adjusted with the following parameters: jdk1.7 and before: -XX:PermSize;
      - XX:MaxPermSize;
      After jdk1.8: -XX:MetaspaceSize; -XX:MaxMetaspaceSize
      After jdk1.8, the size is only limited by the total memory of the machine.
      For example: -XX:MaxMetaspaceSize=3M

    • runtime constant pool

      • Changes in the memory area of ​​each version

        • 1.6

          The runtime constant pool is in the method area

        • 1.7

          The runtime constant pool is on the heap

        • 1.8

          The runtime constant pool is in the metadata area

  • direct memory

    It is not part of the data area when the virtual machine is running, nor is it the memory area defined in the java virtual machine specification; if NIO is used, this area will be frequently used, and it can be directly referenced and operated by the directByteBuffer object in the java heap; this
    area The memory is not limited by the size of the java heap, but is limited by the total memory of the machine, which can be set by -XX:MaxDirectMemorySize (the default is the same as the maximum value of the heap memory), so OOM exceptions will also occur.

Looking at the heap and stack from the perspective of threads

In-depth analysis of heap and stack

###In-depth analysis of heap and stack

  • Function
  1. Store the method call process in the form of a stack frame, and store variables of basic data types (int, short, long, byte, float, double, boolean, har, etc.) and object reference variables during the method call process. The memory allocation is in On the stack, variables will be released automatically when they go out of scope
  1.  The heap memory is used to store objects in Java. Whether it is member variables, local variables, or class variables, the objects they point to are stored in heap memory
  • thread exclusive or shared
  1. The stack memory belongs to a single thread, and each thread has a stack memory, and the variables stored in it can only be seen in the thread to which it belongs, that is, the stack memory can be understood as the private memory of the thread.
  1. Objects in heap memory are visible to all threads. Objects in heap memory can be accessed by all threads.
  • size of space

Stack memory is much smaller than heap memory

method stack

  • stack frame

    A method call allocates a stack frame on the stack

  • stack allocation

    An optimization technology provided by the virtual machine. The basic idea is that for thread-private objects, it is scattered and allocated on the stack instead of on the heap. The advantage is that the object is destroyed by itself following the method call, and no garbage collection is required, which can improve performance.
    The technical foundation required for allocation on the stack, escape analysis. The purpose of escape analysis is to determine whether the scope of the object will escape the method body. Note that any object that can be shared between multiple threads must be an escape object.

    • The effect of allocation on the stack
    public void test(int x,inty ){
          
          
       String x = “”;
       User u =.
    }
    

    The same User object instance is allocated 100,000,000 times, and it only takes 6ms to enable on-stack allocation, and it takes 3S to disable it.

Objects in the virtual machine

  • allocation process

    When the virtual machine encounters a new instruction, it first executes the corresponding class loading process, and
    then the virtual machine will allocate memory for the new object. The task of allocating space for an object is equivalent to dividing a certain size of memory from the Java heap.
    If the memory in the Java heap is absolutely regular, all used memory is placed on one side, free memory is placed on the other side, and a pointer is placed in the middle as an indicator of the demarcation point, then the allocated memory is just to put that pointer Move a distance equal to the size of the object to the free space, this allocation method is called "pointer collision".
    If the memory in the Java heap is not regular, and the used memory and free memory are interleaved, then there is no way to simply collide pointers. The virtual machine must maintain a list to record which memory blocks are available. When allocating, find a large enough space from the list to allocate to the object instance, and update the records on the list. This allocation method is called "free list".
    Which allocation method to choose is determined by whether the Java heap is regular, and whether the Java heap is regular is determined by whether the garbage collector used has a compacting function.
    In addition to how to divide the available space, there is another issue that needs to be considered. Object creation is a very frequent behavior in the virtual machine. Even if only modifying the location pointed to by a pointer, it is not thread-safe under concurrent conditions. , it may happen that memory is being allocated to object A, the pointer has not had time to be modified, and object B uses the original pointer to allocate memory at the same time.
    There are two solutions to this problem. One is to synchronize the action of allocating memory space—in fact, the virtual machine uses CAS with failed retries to ensure the atomicity of update operations;
    The other is to divide the memory allocation into different spaces according to the threads, that is, each thread pre-allocates a small piece of private memory in the Java heap, that is, the local thread allocation buffer (Thread Local Allocation Buffer, TLAB) , if the virtual machine parameter -XX:UseTLAB is set, when the thread is initialized, it will also apply for a memory of a specified size, which is only used by the current thread, so that each thread has a separate Buffer. If you need to allocate memory, just in Allocate on your own Buffer, so that there is no competition, which can greatly improve the allocation efficiency. When the Buffer capacity is not enough, you can apply for a piece from the Eden area to continue using it.
    The purpose of TLAB is to allow each Java application thread to use its own dedicated allocation pointer to allocate space when allocating memory space for new objects, reducing synchronization overhead.
    TLAB only allows each thread to have a private allocation pointer, but the underlying memory space for storing objects is still accessible to all threads, but other threads cannot allocate in this area. When a TLAB is full (the allocation pointer top hits the allocation limit end), a new TLAB is applied for.
    3)
    After the memory allocation is completed, the virtual machine needs to initialize the allocated memory space to zero values ​​(such as the int value is 0, the boolean value is false, etc.). This step ensures that the instance fields of the object can be used directly in the Java code without assigning initial values, and the program can access the zero values ​​corresponding to the data types of these fields.
    4)
    Next, the virtual machine needs to make necessary settings for the object, such as which class the object is an instance of, how to find the metadata information of the class, the hash code of the object, the GC generation age of the object, and other information. This information is stored in the object header of the object.
    5)
    After the above work is completed, from the perspective of the virtual machine, a new object has been generated, but from the perspective of the Java program, object creation has just begun, and all fields are still zero. Therefore, generally speaking, after the new instruction is executed, the object will be initialized according to the programmer's wishes, so that a truly usable object can be completely generated.
    Memory layout of objects
    In the HotSpot virtual machine, the layout of objects stored in memory can be divided into three areas: object header (Header), instance data (Instance Data) and alignment padding (Padding).
    The object header includes two parts of information. The first part is used to store the runtime data of the object itself, such as hash code (HashCode), GC generation age, lock status flag, lock held by the thread, biased thread ID, biased timestamp, etc. .
    The other part of the object header is the type pointer, which is the pointer of the object to its class metadata. The virtual machine uses this pointer to determine which class the object is an instance of.
    The third part of alignment padding does not necessarily exist and has no special meaning, it just acts as a placeholder. Because the automatic memory management system of HotSpot VM requires that the size of the object must be an integer multiple of 8 bytes. When other data parts of the object are not aligned, it needs to be completed by alignment padding.

  • memory layout

  • Object access location

    The purpose of creating an object is to use the object. Our Java program needs to manipulate the specific object on the heap through the reference data on the stack. At present, there are two mainstream access methods, using handles and direct pointers.
    If you use a handle to access, then a piece of memory will be allocated in the Java heap as a handle pool. The reference stores the handle address of the object, and the handle contains the specific address information of the object instance data and type data.
    If you use a direct pointer to access, what is stored in the reference is directly the address of the object.
    These two object access methods have their own advantages. The biggest advantage of using a handle to access is that the reference stores a stable handle address. When the object is moved (moving objects during garbage collection is a very common behavior), only the handle will be changed. The instance data pointer, and the reference itself does not need to be modified.
    The biggest advantage of using the direct pointer access method is that it is faster, which saves the time overhead of a pointer positioning. Since object access is very frequent in Java, this kind of overhead is also a very considerable execution cost after accumulating. .
    For Sun HotSpot, it uses direct pointer access for object access.

Heap parameter setting and memory overflow combat

  • Java heap overflow
  • new generation configuration
  • Method area and runtime constant pool overflow
  • Virtual machine stack and native method stack overflow
  • Native direct memory overflow

2. Garbage collector and memory allocation strategy

GC overview

Judging the survival of the object

  • reference counting

Fast, convenient, and simple to implement. Disadvantage: When objects refer to each other, it is difficult to judge whether the object is changed or not.

  • accessibility analysis

To determine whether the object is alive or not. The basic idea of ​​this algorithm is to use a series of objects called "GC Roots" as the starting point, and start searching downward from these nodes. The path traveled by the search is called the reference chain (Reference Chain). When Roots is not connected by any reference chain, it proves that this object is not available.
The objects used as GC Roots include the following types:
 1. Objects referenced in the virtual machine stack (local variable table in the stack frame).
 2. Objects referenced by class static properties in the method area.
 3. Objects referenced by constants in the method area.
 4. The object referenced by JNI (generally speaking, Native method) in the local method stack.

Differentiate between strong and weak references

  • strong reference

General Object obj = new Object() is a strong reference.

  • Soft reference SoftReference

Some useful but not necessary objects associated with soft references will be recycled before the system OOM occurs.

  • Weak reference WeakReference

Some are useful (lower than soft references) but not necessary. Objects associated with weak references can only survive until the next garbage collection. When GC occurs, regardless of whether there is enough memory, it will be recycled.

  • Virtual reference PhantomReference

Specter references, the weakest, receive a notification when garbage collected

###Note
Soft reference SoftReference and weak reference WeakReference can be used in the case of tight memory resources and the creation of data caches that are not very important. When the system memory is insufficient, the content in the cache can be released.
For example, a program is used to process pictures provided by users. If all the pictures are read into the memory, although the pictures can be opened quickly, the memory space will be huge, and some less used pictures waste memory space and need to be manually removed from the memory. If every time a picture is opened, it is read from the disk file to the memory and then displayed. Although the memory usage is small, some frequently used pictures need to access the disk every time they are opened, and the cost is huge. At this time, you can use soft references to build caches.

GC algorithm

  • mark-sweep algorithm

    The algorithm is divided into two stages of "marking" and "clearing": first mark all objects that need to be recycled, and recycle all marked objects uniformly after the marking is completed.
    Its main problem is insufficient space. After the mark is cleared, a large number of discontinuous memory fragments will be generated. Too much space fragmentation may cause that when the program needs to allocate large objects in the future, it cannot find enough continuous memory and has to be triggered in advance. Another garbage collection action.

  • copy algorithm

    Divide the available memory into two pieces of equal size according to capacity, and only use one of them at a time. When the memory of this block is used up, copy the surviving object to another block, and then clean up the used memory space at one time. In this way, the memory is reclaimed for the entire half area every time, and there is no need to consider complex situations such as memory fragmentation when allocating memory, as long as the memory is allocated in order, which is simple to implement and efficient to operate. It's just that the cost of this algorithm is to reduce the memory to
    half of its original size.

  • Mark-Collating Algorithm

    First mark all the objects that need to be recycled. After the marking is completed, the next step is not to directly clean up the recyclable objects, but to move all surviving objects to one end, and then directly clean up the memory outside the end boundary.

Generational collection

The current garbage collection of commercial virtual machines adopts the "Generational Collection" (Generational Collection) algorithm. This algorithm does not have any new ideas, but divides the memory into several blocks according to the different life cycles of objects. Generally, the Java heap is divided into the new generation and the old generation, so that the most appropriate collection algorithm can be adopted according to the characteristics of each age.
Special research shows that 98% of the objects in the new generation are "live and die", so it is not necessary to divide the memory space according to the ratio of 1:1, but divide the memory into one larger Eden space and two smaller ones. Small Survivor space, use Eden and one of Survivor[1] each time. When recycling, copy the surviving objects in Eden and Survivor to another Survivor space at one time, and finally clean up Eden and the Survivor space just used. The default ratio of Eden to Survivor for the HotSpot virtual machine is 8:1, that is, the available memory space in each new generation is 90% (80%+10%) of the entire new generation capacity, and only 10% of the memory will be "wasted". ". Of course, 98% of the recyclable objects are only data in general scenarios. We have no way to guarantee that no more than 10% of the objects will survive each recycle. When the Survivor space is not enough, we need to rely on other memory (here refers to the old generation) to carry out Assignment guarantee (Handle Promotion).
In the new generation, it is found that a large number of objects die and only a small number of objects survive each time garbage is collected. Then, a copy algorithm is used, and the collection can be completed only by paying the cost of copying a small number of surviving objects. In the old generation, because the object has a high survival rate and there is no additional space to allocate it, it is necessary to use the "mark-clean" or "mark-organize" algorithm for recycling.

Stop The World phenomenon

The goal of GC collector and our GC tuning is to reduce the time and frequency of STW as much as possible

Interpretation of GC logs

Memory allocation and recycling strategy

Objects are allocated in Eden first. If there is insufficient memory space in Eden,
large objects in Minor GC will directly enter the old age. Large objects: Java objects that require a large amount of continuous memory space, such as very long strings and large arrays, 1. Cause If there is space in the memory, it is still necessary to perform garbage collection in advance to obtain continuous space to store them. 2. A large number of memory copies will be performed.
-XX:PretenureSizeThreshold parameter, more than this amount is directly allocated in the old generation, the default is 0, which means that it will never be directly allocated in the old generation.
Long-lived objects will enter the old age, the default is 15 years old, -XX:MaxTenuringThreshold adjusts the
dynamic object age determination, in order to better adapt to the memory status of different programs, the virtual machine does not always require the age of the object to reach MaxTenuringThreshold To promote the old generation, if the sum of the size of all objects of the same age in the Survivor space is greater than half of the Survivor space, the objects whose age is greater than or equal to this age can directly enter the old generation without waiting for the age required in MaxTenuringThreshold Space allocation guarantee: new
generation There are a large number of objects surviving in , and the survivor space is not enough. When a large number of objects still survive after MinorGC (the most extreme case is that all objects in the new generation survive after memory reclamation), the old generation is required to guarantee allocation, and Survivor Objects that cannot be accommodated go directly to the old generation. As long as the continuous space of the old generation is greater than the total size of the new generation objects or the average size of previous promotions, Minor GC will be performed, otherwise FullGC will be performed.

Differentiation and analysis of memory leak and memory overflow

Memory overflow: caused by a real lack of memory space;
memory leak: the object to be released is not released, which is more common when you use a container to store elements.

The tools provided by JDK

  • jps

List the virtual machine processes running on the current machine
-p: only display the VM logo, not display jar, class, main parameters and other information.
-m: output the parameters passed in by the main function. The hello below is to execute the program from Parameters entered on the command line
-l: output the full package name of the main class of the application or the full name of the jar.
-v: list the jvm parameters, -Xms20m -Xmx50m are the jvm parameters specified by the startup program

  • to stand

It is a command-line tool for monitoring various running status information of virtual machines. It can display the running data such as class loading, memory, garbage collection, JIT compilation, etc. in the local or remote virtual machine process. The tool of choice for machine performance problems.
Assuming that you need to query the garbage collection status of process 2764 every 250 milliseconds, a total of 20 queries, the command should be: jstat-gc 2764 250 20 Common parameters: -class (
class
loader)
-compiler (JIT)
-gc (GC heap status )
-gccapacity (district size)
-gccause (last GC statistics and reasons)
-gcnew (new district statistics)
-gcnewcapacity (new district size)
-gcold (old district statistics)
-gcoldcapacity (old district size)
-gcpermcapacity (permanent district size)
-gcutil (GC statistics summary)
-printcompilation (HotSpot compilation statistics)

  • jinfo

View and modify the parameters of the virtual machine
jinfo –sysprops You can view the parameters obtained by System.getProperties()
jinfo –flag The system default value of the parameter that is not explicitly specified
jinfo –flags (note s) Display the parameters of the virtual machine
jinfo –flag +[parameter] can add parameters, but only limited to
the manageable parameters queried by java -XX:+PrintFlagsFinal –version
jinfo –flag -[parameter] can remove parameters

  • jmap

Used to generate heap dump snapshots (generally called heapdump or dump files). The role of jmap is not only to obtain dump files, it can also query the details of the finalize execution queue, Java heap and permanent generation, such as space usage, which collector is currently used, and so on. Like the jinfo command, jmap has many functions that are limited under the Windows platform, except for the -dump option for generating dump files and the -histo option for viewing instances of each class and space occupancy statistics, which are available on all operating systems. Except provided, the rest of the options are only available under Linux/Solaris.
jmap -dump:live,format=b,file=heap.bin
Sun JDK provides the jhat (JVM Heap Analysis Tool) command to be used with jmap to analyze the heap dump snapshot generated by jmap.

  • Jhat

After the jhat dump file name,
the screen displays "Server is ready." After the prompt, the user can enter http://localhost:7000/ in the browser to access the details

  • jstack

The (Stack Trace for Java) command is used to generate a thread snapshot of the virtual machine at the current moment. A thread snapshot is a collection of method stacks being executed by each thread in the current virtual machine. The main purpose of generating a thread snapshot is to locate the cause of a long pause in a thread, such as deadlock between threads, an infinite loop, and a long time caused by requesting external resources. Waiting etc. are common causes of thread pauses for long periods of time.
In the code, the getAllStackTraces() method of the java.lang.Thread class can be used to obtain the StackTraceElement objects of all threads in the virtual machine. Using this method can complete most of the functions of jstack with a few simple lines of code. In actual projects, you may wish to call this method to create an administrator page, and you can use a browser to view the thread stack at any time.

To manage the remote process, you need to add the startup parameters of the remote program:
-Djava.rmi.server.hostname=...
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=8888
-Dcom.sun.management .jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false

Learn about MAT

  • Shallow and Deep Heaps

Shallow heap : (Shallow Heap) refers to the memory consumed by an object. For example, in a 32-bit system, an object reference will occupy 4 bytes, an int type will occupy 4 bytes, a long type variable will occupy 8 bytes, and each object header will occupy 8 bytes.
Deep heap : After this object is reclaimed by GC, the memory size that can be actually released, that is, the collection of all objects that can only be directly or indirectly accessed through objects. In layman's terms, it refers to the collection of objects held only by objects. The deep heap is the sum of the shallow heap sizes of all objects in the object's retain set.
Example : Object A references C and D, and object B references C and E. Then the shallow heap size of object A is only A itself, excluding C and D, and the actual size of A is the sum of A, C, and D. The deep heap size of A is the sum of A and D. Since object C can also be accessed through object B, it is not within the deep heap range of object A.

garbage collector

  • use the algorithms

The current garbage collection of commercial virtual machines adopts the "Generational Collection" (Generational Collection) algorithm. This algorithm does not have any new ideas, but divides the memory into several blocks according to the different life cycles of objects. Generally, the Java heap is divided into the new generation and the old generation, so that the most appropriate collection algorithm can be adopted according to the characteristics of each age.
Special research shows that 98% of the objects in the new generation are "live and die", so it is not necessary to divide the memory space according to the ratio of 1:1, but divide the memory into one larger Eden space and two smaller ones. Small Survivor space, use Eden and one of Survivor[1] each time. When recycling, copy the surviving objects in Eden and Survivor to another Survivor space at one time, and finally clean up Eden and the Survivor space just used. The default ratio of Eden to Survivor for the HotSpot virtual machine is 8:1, that is, the available memory space in each new generation is 90% (80%+10%) of the entire new generation capacity, and only 10% of the memory will be "wasted". ". Of course, 98% of the recyclable objects are only data in general scenarios. We have no way to guarantee that no more than 10% of the objects will survive each recycle. When the Survivor space is not enough, we need to rely on other memory (here refers to the old generation) to carry out Assignment guarantee (Handle Promotion).
In the new generation, it is found that a large number of objects die and only a small number of objects survive each time garbage is collected. Then, a copy algorithm is used, and the collection can be completed only by paying the cost of copying a small number of surviving objects. In the old generation, because the object has a high survival rate and there is no additional space to allocate it, it is necessary to use the "mark-clean" or "mark-organize" algorithm for recycling.

  • Garbage Collector Overview

    Serial/Serial Old、ParNew 、Parallel Scavenge(ParallerGC)/Parallel Old、Concurrent Mark Sweep (CMS)、G1

  • Detailed working of the garbage collector

  • Detailed explanation of G1

  • future garbage collection

    ZGC uses technical means to control the situation of STW to only one time, that is, the first initial mark will happen, so it is not difficult to understand why the GC pause time does not increase with the increase of the heap, no matter how big it is, I also use concurrency time to recover
    key technologies

    1. Colored Pointers
    2. Load Barrier

garbage collector

  • use the algorithms

The current garbage collection of commercial virtual machines adopts the "Generational Collection" (Generational Collection) algorithm. This algorithm does not have any new ideas, but divides the memory into several blocks according to the different life cycles of objects. Generally, the Java heap is divided into the new generation and the old generation, so that the most appropriate collection algorithm can be adopted according to the characteristics of each age.
Special research shows that 98% of the objects in the new generation are "live and die", so it is not necessary to divide the memory space according to the ratio of 1:1, but divide the memory into one larger Eden space and two smaller ones. Small Survivor space, use Eden and one of Survivor[1] each time. When recycling, copy the surviving objects in Eden and Survivor to another Survivor space at one time, and finally clean up Eden and the Survivor space just used. The default ratio of Eden to Survivor for the HotSpot virtual machine is 8:1, that is, the available memory space in each new generation is 90% (80%+10%) of the entire new generation capacity, and only 10% of the memory will be "wasted". ". Of course, 98% of the recyclable objects are only data in general scenarios. We have no way to guarantee that no more than 10% of the objects will survive each recycle. When the Survivor space is not enough, we need to rely on other memory (here refers to the old generation) to carry out Assignment guarantee (Handle Promotion).
In the new generation, it is found that a large number of objects die and only a small number of objects survive each time garbage is collected. Then, a copy algorithm is used, and the collection can be completed only by paying the cost of copying a small number of surviving objects. In the old generation, because the object has a high survival rate and there is no additional space to allocate it, it is necessary to use the "mark-clean" or "mark-organize" algorithm for recycling.

  • Garbage Collector Overview

    Serial/Serial Old、ParNew 、Parallel Scavenge(ParallerGC)/Parallel Old、Concurrent Mark Sweep (CMS)、G1

  • Detailed working of the garbage collector

  • Detailed explanation of G1

  • future garbage collection

    ZGC uses technical means to control the situation of STW to only one time, that is, the first initial mark will happen, so it is not difficult to understand why the GC pause time does not increase with the increase of the heap, no matter how big it is, I also use concurrency time to recover
    key technologies

    1. Colored Pointers
    2. Load Barrier

3. The execution subsystem of the JVM

Class file essence

  1. Virtual machines on various platforms and the program storage format uniformly used by all platforms - bytecode (ByteCode) is the cornerstone of platform independence and the basis of language independence. The Java virtual machine is not bound to any language including Java. It is only associated with the specific binary file format of the "Class file". The Class file contains the Java virtual machine instruction set and symbol table as well as several other auxiliary information. .
  2. Any Class file corresponds to the definition information of a unique class or interface, but conversely, the Class file does not necessarily exist in the form of a disk file.
    Class files are a set of binary streams based on 8-bit bytes.

Class file format

Each data item is strictly arranged in the Class file in a compact order without adding any separators in the middle, which makes almost all the content stored in the entire Class file the necessary data for the program to run, and there is no gap.
The Class file format uses a pseudo-structure similar to the C language structure to store data. There are only two data types in this pseudo-structure: unsigned numbers and tables.
Unsigned numbers belong to the basic data types. U1, u2, u4, and u8 represent unsigned numbers of 1 byte, 2 bytes, 4 bytes, and 8 bytes respectively. Unsigned numbers can be used to Describe numbers, index references, quantity values, or form string values ​​according to UTF-8 encoding.
A table is a composite data type composed of multiple unsigned numbers or other tables as data items, and all tables habitually end with "_info". A table is used to describe data with a composite structure with a hierarchical relationship, and the entire Class file is essentially a table.

  • Detailed format

    The structure of Class is not like XML and other description languages. Since it does not have any separators, the data items in it, whether in order or quantity, are strictly limited. Which byte represents what meaning, what is the length, and the order No matter what, no changes are allowed. Include in order:

    • The version of the magic number and Class file

    The first 4 bytes of each Class file is called the Magic Number, and its only function is to determine whether the file is a Class file that can be accepted by the virtual machine. Using magic numbers instead of extensions for identification is mainly based on security considerations, because file extensions can be changed at will. File format authors are free to choose the magic value, as long as the magic value has not been widely adopted and does not cause confusion.
    The 4 bytes following the magic number store the version number of the Class file: the 5th and 6th bytes are the minor version number (MinorVersion), and the 7th and 8th bytes are the major version number (Major Version) . The version number of Java starts from 45. After JDK 1.1, the main version number of each JDK major version is increased by 1. The higher version of JDK can be backward compatible with the previous version of the Class file, but cannot run the later version of the Class file, even if The file format has not changed in any way, and the virtual machine must also refuse to execute Class files that exceed its version number.

    • constant pool

    The number of constants in the constant pool is not fixed, so a u2 type of data needs to be placed at the entrance of the constant pool, representing the constant pool capacity count value (constant_pool_count). Unlike the language habits in Java, this capacity count starts from 1 instead of 0. The
    constant pool mainly stores two types of constants: Literal and Symbolic References.
    Literal quantities are closer to the constant concept at the Java language level, such as text strings, constant values ​​declared as final, and so on.
    Symbolic references belong to the concept of compilation principles, including the following three types of constants:
    fully qualified names of classes and interfaces (Fully Qualified Name), field names and descriptors (Descriptor), method names and descriptors

    • access sign

    Access information used to identify some classes or interface levels, including: whether the Class is a class or an interface; whether it is defined as a public type; whether it is defined as an abstract type; if it is a class, whether it is declared as final, etc.

    • Collection of class index, parent class index and interface index

    These three items of data are used to determine the inheritance relationship of this class. The class index is used to determine the fully qualified name of this class, and the parent class index is used to determine the fully qualified name of the parent class of this class. Since the Java language does not allow multiple inheritance, there is only one parent class index. Except for java.lang.Object, all Java classes have parent classes. Therefore, except for java.lang.Object, the parent class indexes of all Java classes are is not 0. The interface index collection is used to describe which interfaces this class implements, and these implemented interfaces will be arranged in the interface index from left to right according to the order of the interfaces after the implements statement (if the class itself is an interface, it should be an extends statement) in the collection

    • field table collection

    Describes variables declared in an interface or class. Fields include class-level variables as well as instance-level variables.
    The name of the field and the data type of the field are not fixed, and can only be described by referring to constants in the constant pool.
    Fields inherited from superclasses or parent interfaces will not be listed in the field table collection, but fields that do not exist in the original Java code may be listed. For example, in order to maintain access to external classes in inner classes, A field pointing to the outer class instance is automatically added.

    • method table collection

    Describes the definition of the method, but the Java code in the method, after being compiled into bytecode instructions by the compiler, is stored in an attribute named "Code" in the method attribute table set in the attribute table set.
    Similar to the field table collection, if the parent class method is not overridden in the subclass (Override), the method information from the parent class will not appear in the method table collection. But in the same way, there may be methods automatically added by the compiler, the most typical ones are the class constructor "<clinit>" method and the instance constructor "<init>"

    • attribute table collection

    Store Class files, field tables, and method tables with their own attribute table collections, which are used to describe information specific to certain scenarios. For example, the code of the method is stored in the Code attribute table.

bytecode instructions

  • know

The instruction of the Java virtual machine consists of a byte-length number representing the meaning of a specific operation (called an operation code, Opcode) followed by zero or more parameters required for this operation (called an operand, Operands ) to form.
Since the length of the Java virtual machine opcode is limited to one byte (that is, 0 to 255), this means that the total number of opcodes in the instruction set cannot exceed 256.
Most instructions include information about the data type they operate on. For example:
The iload instruction is used to load int type data from the local variable table to the operand stack, while the fload instruction loads float type data.
Most of the instructions do not support the integer types byte, char, and short, and none of the instructions even support the boolean type. Most operations on boolean, byte, short, and char type data actually use the corresponding int type as the operation type.
Reading bytecode is a basic skill for understanding the Java virtual machine, please master it. Please be familiar with and master the common instructions.

  • load and store instructions

Used to transfer data back and forth between the local variable table in the stack frame and the operand stack, such instructions include the following.
Load a local variable onto the operation stack: iload, iload_<n>, lload, lload_<n>, fload, fload_<n>, dload, dload_<n>, aload, aload_<n>.
Store a value from the operand stack to the local variable table: istore, istore_<n>, lstore, lstore_<n>, fstore, fstore_<n>, dstore, dstore_<n>, astore, astore_<n>.
Load a constant onto the operand stack: bipush, sipush, ldc, ldc_w, ldc2_w, aconst_null, iconst_m1, iconst_<i>, lconst_<l>, fconst_<f>, dconst_<d>.
The command to expand the access index of the local variable table: wide.

  • operation or arithmetic instruction

It is used to perform a specific operation on the values ​​on the two operand stacks, and store the result back on the top of the operation stack.
Addition instructions: iadd, ladd, fadd, dadd.
Subtraction instructions: isub, lsub, fsub, dsub.
Multiplication instructions: imul, lmul, fmul, dmul, etc.

  • type conversion instruction

Two different numerical types can be converted to each other.
The Java virtual machine directly supports widened type conversion of the following numerical types (that is, safe conversion from a small-range type to a wide-range type):
int type to long, float or double type.
Long type to float, double type.
float type to double type.
When dealing with narrowing type conversions (Narrowing Numeric Conversions), it must be done explicitly using conversion instructions, including: i2b, i2c, i2s, l2i, f2i, f2l, d2i, d2l, and d2f.

  • Instructions for creating class instances

new

  • Instructions for creating arrays

newarray、anewarray、multianewarray

  • access field command

getfield、putfield、getstatic、putstatic

  • Array access related instructions

Instructions that load an array element into the operand stack: baload, caload, saload, iaload, laload, faload, daload, aaload.
Instructions that store the value of an operand stack into an array element: bastore, castore, sastore, iastore, fastore, dastore, aastore.
Command to get the length of the array: arraylength.

  • Instructions for checking the type of a class instance

instanceof、checkcast

* Operand stack management instructions

Just like operating a stack in an ordinary data structure, the Java virtual machine provides some instructions for directly manipulating the operand stack, including: popping one or two elements from the top of the operand stack: pop, pop2.
Duplicates one or two values ​​from the top of the stack and pushes the duplicate or duplicate duplicates back onto the stack: dup, dup2, dup_x1, dup2_x1, dup_x2, dup2_x2.
Swap the two values ​​​​at the top of the stack: swap

  • control transfer instruction

The control transfer instruction allows the Java virtual machine to conditionally or unconditionally continue to execute the program from the specified location instruction instead of the next instruction of the control transfer instruction. From the conceptual model, it can be considered that the control transfer instruction is conditionally or unconditionally modified. The value of the PC register. The control transfer instructions are as follows.
Conditional branches: ifeq, iflt, ifle, ifne, ifgt, ifge, ifnull, ifnonnull, if_icmpeq, if_icmpne, if_icmplt, if_icmpgt, if_icmple, if_icmpge, if_acmpeq, and if_acmpne.
Compound condition branch: tableswitch, lookupswitch.
Unconditional branch: goto, goto_w, jsr, jsr_w, ret.

  • method call instruction

The invokevirtual instruction is used to call the instance method of the object, and dispatch according to the actual type of the object (virtual method dispatch), which is also the most common method dispatch method in the Java language.
The invokeinterface command is used to call the interface method, it will search for an object that implements the interface method at runtime, and find a suitable method to call.
The invokespecial instruction is used to call some instance methods that require special handling, including instance initialization methods, private methods, and parent class methods.
The invokestatic instruction is used to call a class method (static method).
The invokedynamic instruction is used to dynamically parse out the method referenced by the call point qualifier at runtime and execute the method. The dispatch logic of the previous 4 call instructions is solidified inside the Java virtual machine, while the dispatch logic of the invokedynamic instruction is determined by the user. It is determined by the set boot method.
Method call instructions are data type independent.

  • method return instruction

It is distinguished according to the type of return value, including ireturn (used when the return value is boolean, byte, char, short and int), lreturn, freturn, dreturn and areturn, and there is also a return instruction for methods declared as void , instance initialization methods, and class initialization methods for classes and interfaces.

  • exception handling instructions

The operation (throw statement) that explicitly throws an exception in a Java program is implemented by the athrow instruction

  • synchronous command

There are two instructions monitorenter and monitorexit to support the semantics of the synchronized keyword

class loading mechanism

  • Detailed loading process

    • overview

    From the time a class is loaded into the virtual machine memory, until it is unloaded from the memory, its entire life cycle includes: loading (Loading), verification (Verification), preparation (Preparation), resolution (Resolution), initialization (Initialization), use (Using) and unloading (Unloading) 7 stages. Among them, the three parts of verification, preparation, and analysis are collectively referred to as Linking
    in the initialization phase. The virtual machine specification strictly stipulates that there are only 5 situations in which the class must be "initialized" immediately (loading, verification, and preparation naturally require started before):

    1. When encountering the four bytecode instructions of new, getstatic, putstatic or invokestatic, if the class has not been initialized, its initialization needs to be triggered first. The most common Java code scenarios for generating these 4 instructions are: when using the new keyword to instantiate an object, reading or setting a static field of a class (static field modified by final, and the result has been put into the constant pool at compile time) fields), and when calling a static method of a class.
    2. When using the method of the java.lang.reflect package to make a reflective call to a class, if the class has not been initialized, you need to trigger its initialization first.
    3. When initializing a class, if you find that its parent class has not been initialized, you need to trigger the initialization of its parent class first.
    4. When the virtual machine starts, the user needs to specify a main class to be executed (the class containing the main() method), and the virtual machine first initializes the main class.
    5. When using the dynamic language support of JDK 1.7, if the final analysis result of a java.lang.invoke.MethodHandle instance is the method handle of REF_getStatic, REF_putStatic, REF_invokeStatic, and the class corresponding to this method handle has not been initialized, it needs to be triggered first. its initialization.
    • Notice

    For static fields, only the class that directly defines this field will be initialized, so referencing the static field defined in the parent class through its subclass will only trigger the initialization of the parent class and not the initialization of the subclass.
    The constant HELLOWORLD, but in fact, through the optimization of constant propagation during the compilation stage, the value of this constant "hello world" has been stored in the constant pool of the NotInitialization class. In the future, the references of the constant ConstClass.HELLOWORLD by NotInitialization are actually converted into the NotInitialization class itself. A reference to the constant pool.
    That is to say, in fact, there is no symbol reference entry of the ConstClass class in the Class file of NotInitialization, and there is no connection between these two classes after they are compiled into Class.

    • loading stage

    The virtual machine needs to complete the following 3 things:

    1. Get the binary byte stream defining this class by its fully qualified name.
    2. Convert the static storage structure represented by this byte stream into the runtime data structure of the method area.
    3. A java.lang.Class object representing this class is generated in the memory as an access entry for various data of this class in the method area.
    • verify

    It is the first step of the connection phase. The purpose of this phase is to ensure that the information contained in the byte stream of the Class file meets the requirements of the current virtual machine and will not endanger the safety of the virtual machine itself. But on the whole, the verification phase will roughly complete the following four stages of verification actions: file format verification, metadata verification, bytecode verification, and symbol reference verification.

    • Preparation Phase

    It is the stage of formally allocating memory for class variables and setting the initial value of class variables. The memory used by these variables will be allocated in the method area. At this stage, there are two confusing concepts that need to be emphasized. First, memory allocation at this time only includes class variables (variables modified by static), not instance variables. Instance variables will be instantiated when the object is instantiated. Along with the object allocated in the Java heap. Secondly, the initial value mentioned here is "normally" the zero value of the data type. Suppose a class variable is defined as:
    public static int value=123;
    the initial value of the variable value after the preparation phase is 0 instead of 123 , because no Java method has been executed at this time, and the putstatic instruction that assigns the value to 123 is stored in the class constructor <clinit> () method after the program is compiled, so the action of assigning the value to 123 will be in The initialization phase will only be executed. Table 7-1 lists the zero values ​​of all primitive data types in Java.
    Suppose the definition of the above class variable value becomes: public static final int value=123;
    Javac will generate a ConstantValue attribute for the value during compilation, and the virtual machine will assign the value to 123 according to the ConstantValue setting during the preparation stage.

    • parsing stage

    It is the process by which the virtual machine replaces the symbolic references in the constant pool with direct references

    • class initialization phase

    It is the last step of the class loading process. In the previous class loading process, except that the user application can participate in the loading phase through the custom class loader, the rest of the actions are completely dominated and controlled by the virtual machine. In the initialization phase, the Java program code defined in the class is actually executed. In the preparation phase, the variables have been assigned the initial value required by the system once. In the initialization phase, the class variables and variables are initialized according to the subjective plan formulated by the programmer through the program. Other resources, or it can be expressed from another angle: the initialization phase is the process of executing the class constructor <clinit>() method. The <clinit>() method is generated by the combination of the assignment actions of all class variables in the class automatically collected by the compiler and the statements in the static statement block (static{} block). determined by the order in which they appear.
    The <clinit>() method is not necessary for a class or interface. If there is no static statement block in a class and no assignment operation to variables, then the compiler does not need to generate the <clinit>() method for this class.
    The virtual machine ensures that the <clinit> () method of a class is correctly locked and synchronized in a multi-threaded environment. If multiple threads initialize a class at the same time, only one thread will execute the <clinit> ( ) method, other threads need to block and wait until the active thread executes the <clinit>() method. If there is a time-consuming operation in the <clinit>() method of a class, it may cause multiple processes to block.

  • class loader

    • Custom class loading encrypts and decrypts classes

      Override the findClass method

    • system class loader

      For any class, its uniqueness in the Java virtual machine needs to be established by the class loader that loads it and the class itself. Each class loader has an independent class namespace. This sentence can be expressed more generally: comparing whether two classes are "equal" is meaningful only if the two classes are loaded by the same class loader, otherwise, even if the two classes come from the same A Class file is loaded by the same virtual machine, as long as the class loaders that load them are different, the two classes must be unequal.
      The "equality" referred to here includes the return results of the equals() method, isAssignableFrom() method, and isInstance() method of the Class object representing the class, as well as the use of the instanceof keyword to determine the relationship of the object.
      When customizing the subclass of ClassLoader, we usually have two methods, one is to rewrite the loadClass method, and the other is to rewrite the findClass method. In fact, these two methods are similar in essence. After all, loadClass will also call findClass, but logically speaking, it is best not to directly modify the internal logic of loadClass. The approach I suggest is to only override the loading method of the custom class in findClass.
      The loadClass method is where the logic of the parent delegation model is implemented. Modifying this method without authorization will cause the model to be destroyed and cause problems. Therefore, it is best for us to make small-scale changes within the framework of the parental delegation model without destroying the original stable structure. At the same time, it also avoids having to write the repeated code entrusted by the parents in the process of rewriting the loadClass method. From the perspective of code reusability, it is always a better choice not to directly modify this method.

    • Parental Delegation Model

      From the perspective of the Java virtual machine, there are only two different class loaders: one is the bootstrap class loader (Bootstrap ClassLoader), which is implemented in C++ language and is part of the virtual machine itself; That is, all other class loaders are implemented by the Java language, independent of the virtual machine, and all inherit from the abstract class java.lang.ClassLoader.
      Bootstrap ClassLoader: This class loader is responsible for storing it in the <JAVA_HOME>\lib directory, or in the path specified by the -Xbootclasspath parameter, and is recognized by the virtual machine (only by the file name) , such as rt.jar, the class library whose name does not match will not be loaded even if it is placed in the lib directory) The class library is loaded into the virtual machine memory. The startup class loader cannot be directly referenced by Java programs. When users write custom class loaders, if they need to delegate the loading request to the boot class loader, they can directly use null instead.
      Extension ClassLoader (Extension ClassLoader): This loader is implemented by sun.misc.Launcher$ExtClassLoader, which is responsible for loading <JAVA_HOME>\lib\ext directory, or the path specified by the java.ext.dirs system variable For all class libraries, developers can directly use the extension class loader.
      Application ClassLoader (Application ClassLoader): This class loader is implemented by sun.misc.Launcher $App-ClassLoader. Since this class loader is the return value of the getSystemClassLoader() method in ClassLoader, it is generally called the system class loader. It is responsible for loading the class library specified on the user's class path (ClassPath). Developers can use this class loader directly. If the application has not customized its own class loader, generally this is the default class in the program. Loader.
      Our applications are loaded by these three class loaders in cooperation with each other. If necessary, you can also add your own defined class loader.
      The parental delegation model requires that all class loaders except the top-level startup class loader should have their own parent class loader. The parent-child relationship between class loaders here is generally not implemented as an inheritance (Inheritance) relationship, but uses a composition (Composition) relationship to reuse the code of the parent loader.
      One of the obvious benefits of using the parental delegation model to organize the relationship between class loaders is that the Java class has a hierarchical relationship with priority along with its class loader. For example, the class java.lang.Object is stored in rt.jar. No matter which class loader wants to load this class, it will eventually be delegated to the startup class loader at the top of the model for loading. Therefore, the Object class is loaded in the program The same class is present in the various classloader contexts of . On the contrary, if the parental delegation model is not used, and each class loader loads it by itself, if the user writes a class called java. different Object classes, the most basic behavior in the Java type system cannot be guaranteed, and the application will become chaotic.

  • Tomcat class loading mechanism

    Tomcat itself is also a java project, so it also needs to be loaded by the class loading mechanism of JDK, so there must be a bootstrap class loader, an extension class loader and an application (system) class loader.
    Common ClassLoader is the parent of Catalina ClassLoader and Shared ClassLoader, and Shared ClassLoader may have multiple children class loaders WebApp ClassLoader. A WebApp ClassLoader actually corresponds to a Web application, so there may be Jsp pages in the Web application. These Jsp pages It will eventually be converted into a class class to be loaded, so a Jsp class loader is also required.
    It should be noted that at the code level, the entity classes corresponding to Catalina ClassLoader, Shared ClassLoader, and Common ClassLoader are actually URLClassLoader or SecureClassLoader. Generally, we only logically divide them into these three according to the difference in loading content and the relationship between loading parent and child order. A class loader; WebApp ClassLoader and JasperLoader both have corresponding class loaders.
    When tomcat starts, several class loaders are created:
    1 Bootstrap bootstrap class loader loads the classes required for JVM startup, as well as standard extension classes (located under jre/lib/ext)
    2 System system class loader loads the classes started by tomcat Classes, such as bootstrap.jar, are usually specified in catalina.bat or catalina.sh. Located under CATALINA_HOME/bin.
    3 Common The common class loader loads some common classes used by tomcat and applied, located under CATALINA_HOME/lib, such as servlet-api.jar
    4 webapp application class loader After each application is deployed, a unique class loader will be created. The class loader will load the class in the jar file under WEB-INF/lib and the class file under WEB-INF/classes.

Detailed stack frame

Detailed method call

The call target must be determined when the program code is written and the compiler compiles. Calls of such methods are called resolutions.
In the Java language, the methods that meet the requirement of "knowable at compile time and immutable at runtime" mainly include static methods and private methods. The former is directly associated with the type, and the latter cannot be accessed externally. The characteristics of them determine that it is impossible for them to rewrite other versions through inheritance or other means, so they are all suitable for parsing in the class loading phase.

Stack-based bytecode interpretation and execution engine

  • Stack-based instruction set vs. register-based instruction set
  • Analyze the execution of code in the virtual machine

4. Write efficient and elegant Java programs

What if there are too many constructor parameters?

Use the builder mode, and use it for
1, 5 or more member variables
2. There are not many parameters, but in the future, the parameters will increase

Classes that do not need to be instantiated should have private constructors

Don't create unnecessary objects

  • Avoid inadvertently creating objects, such as inside an autoboxing loop because autoboxing and unboxing will create useless objects.

  • For member variables that can be reused across multiple instances of the class, try to use static.

Avoid finalizers

The finalizer method, jdk cannot guarantee when it will be executed, nor can it guarantee that it will be executed. If there are resources that really need to be released, try/finally should be used.

Minimize class and member accessibility

When writing programs and designing architectures, one of the most important goals is decoupling between modules. Minimizing the accessibility of classes and members is undoubtedly one of the effective ways.

minimize variability

Try to make the class immutable. Immutable classes are easier to design, implement and use than mutable classes, and they are less error-prone and safer.
Commonly used methods:
do not provide any method that can modify the state of the object;
make all fields final.
Make all fields private.
Use a copy-on-write mechanism. Problems brought about: It will cause the system to generate a large number of objects, and the performance will have a certain impact, which needs to be carefully weighed during use.

Composite

Inheritance easily breaks encapsulation and makes the implementation of subclasses dependent on the parent class.
Composite is to add a private field in the class and refer to an instance of the class, so as to avoid relying on the specific implementation of the class.
Inheritance is only appropriate when the subclass is indeed a subtype of the parent class.

Interfaces are better than abstract classes

Variable parameters should be used with caution

Variable parameters are allowed to pass 0 parameters.
If the number of parameters is between 1 and multiple, separate business control is required.

Return a zero-length array or collection, do not return null

Prefer standard exceptions

It is necessary to pursue code reuse as much as possible, while reducing the number of class loading and improving the performance of class loading.
Commonly used exceptions:
IlegalAraumentException – The parameters passed by the caller are inappropriate
lllegalStateException – The state of the received object is incorrect,
NullPointException – Null pointer exception
UnsupportedOperationException – Unsupported operation

use enum instead of int constant

Minimize the scope of local variables

1. Declare at the place where it is used for the first time.
2. Local variables must be initialized by themselves. If the initialization conditions are not met, do not declare
the benefits of minimization, reduce the size of the local variable table, and improve performance; at the same time, avoid excessive local variables. Early declaration leads to incorrect use.

Precise calculation, avoid using float and double

Beware the performance of string concatenation

When there are a large number of string splicing or large string splicing, try to use StringBuilder and StringBuffer

5. In-depth understanding of performance optimization

Commonly used performance evaluation/test indicators

  • Response time

    • The time taken between submitting a request and returning a response to that request, generally focusing on the average response time.
      List of response times for common operations:
    operate Response time
    open a site few seconds
    Database query for a record (with index) ten milliseconds
    One-time addressing and positioning of mechanical disk 4 milliseconds
    Read 1M data sequentially from the mechanical disk 2 milliseconds
    Read 1M data sequentially from SSD disk 0.3 milliseconds
    Read a data from remote distributed to Redis 0.5 milliseconds
    Read 1M data from memory ten microseconds
    Java program native method call few microseconds
    Network transfer 2Kb data 1 Subtle
  • concurrent number

    At the same time, the number of requests for actual interaction with the server.
    Correlation with the number of online users of the website: 1000 simultaneous online users, it can be estimated that the concurrent number is between 5% and 15%, that is, the simultaneous concurrent number is between 50 and 150.

  • throughput

    A measure of the amount of work (requests) completed per unit of time

  • mutual relationship

    • The relationship between system throughput, system concurrency and response time:

    It is understood as the traffic status of the expressway:
    the throughput is the number of vehicles passing through the toll booth every day (it can be converted into the high-speed toll collected by the toll booth), the
    concurrent number is the number of vehicles on the expressway, and
    the response time is the speed of the vehicle.
    When there are few vehicles, the speed is fast. But the high-speed toll received is correspondingly less; as the number of vehicles on the expressway increases, the speed of the vehicle is slightly affected, but the high-speed toll received increases rapidly; as the number of vehicles continues to increase, the speed of the vehicle becomes slower and
    slower , the expressway is getting more and more congested, and the tolls will not increase but will decrease;
    if the traffic flow continues to increase, after exceeding a certain limit, the accidental factors of the task will lead to the complete paralysis of the expressway. out of the parking lot (resource exhausted).

Common performance optimization methods

  • general principles

    • avoid premature optimization

    A lot of time should not be spent on small performance improvements, thinking about optimization prematurely is the root of all nightmares.
    Therefore, we should write clear, direct, readable and understandable code, and the real optimization should be left for later, when the performance analysis shows that the optimization measures have huge benefits.
    But premature optimization does not mean that we should write code structures that we already know are bad for performance.

    • Perform system performance tests

    All performance tuning should be based on performance testing. Intuition is very important, but data must be used to speak. It can be speculated, but it must be verified through testing.

    • Find system bottlenecks, divide and conquer, and gradually optimize

    After the performance test, analyze each link of the entire request experience, check where performance bottlenecks occur, locate problems, and analyze what are the main factors affecting performance? Memory, disk IO, network, CPU, or code issues? Insufficient architectural design? Or is it indeed a lack of system resources?

  • Front-end optimization means

    Browser/App

    • reduce the number of requests;

    Combine CSS, Js, images

    • Use client-side buffering;

    Static resource files are cached in the browser. If the related properties Cache-Control and Expires
    change and need to be updated, the file name can be changed to solve the problem.

    • enable compression

    Reduce the amount of network transmission, but it will bring performance pressure to the browser and server, which needs to be used in balance.

    • Resource file loading order

    css at the top of the page, js at the bottom

    • Reduce Cookie Transmission

    Cookies are included in each request and response, so which data to write to cookies needs to be carefully considered

    • give the user a hint

    Sometimes giving the user a prompt on the front end can get good results. After all, what users need is not to ignore him.

    CDN acceleration

    CDN, also known as content distribution network, is still a cache in essence, and it caches data in the place closest to the user. When the CDN cannot be implemented by itself, commercial CDN services can be considered.

    reverse proxy cache

    Cache static resource files on a reverse proxy server, usually Nginx.

    WEB component separation

    Put js, css and image files under different domain names. It can increase the concurrent number of browsers downloading web components. Because the browser has a limit on the number of concurrent downloads of the data of the same domain name.

  • Application service performance optimization

  • Storage Performance Optimization

    • Choose the right data structure

    Choosing ArrayList and LinkedList has a great impact on our program performance, why? Because ArrayList is internally implemented as an array, there is constant expansion and data replication.

    • Choose a better algorithm

    For example, the maximum sub-column sum problem:
    Given an integer sequence, a0, a1, a2, ... , an (item can be negative), find the largest sub-sequence sum.
    If all integers are negative, then the maximum subsequence sum is 0;
    for example (a[1],a[2],a[3],a[4],a[5],a[6])=(- 2,11,-4,13,-5,-2),
    the maximum subsection sum is 20, and the subsections are a[2], a[3], a[4].
    The worst algorithm: the exhaustive method, the required calculation time is O(n^3). The
    general algorithm: the calculation time complexity of the divide and conquer method is O(nlogn).
    The best algorithm: the maximum sub-segment sum The dynamic programming algorithm, the calculation time complexity is O(n)
    The larger the n, the greater the time difference, such as 10,000 elements, the gap between the worst algorithm and the best algorithm is by no means multi-threading or clustering performance Easy fix.

    • write less code

    The same correct program, small programs are faster than large programs, this has nothing to do with the programming language.

Learn more about App Service Performance Optimization

  • cache

    • The fundamentals and nature of caching

      Cache is to store data in a medium with high access speed. Data access time can be reduced while avoiding double counting.

    • Guidelines for Proper Use of Buffers

      Frequently modified data should not be cached as much as possible. The value of caching is only when the read-write ratio is above 2:1.
      The cache must be hot data.
      The application needs to tolerate data inconsistency for a certain period of time.
      Cache availability issues are generally resolved through hot standby or clustering.
      The cache is warmed up. The newly started cache system does not have any data. You can consider loading some hot data into the cache system in advance.
      Solve cache breakdown:
      1. Bloom filter, or 2. Cache non-existing data. For example, there is a request to always access data with key = 23, but this data with key = 23 does not exist in the system. You can Consider constructing a (key=23 value = null) data in the cache.

    • Distributed cache and consistent hashing

      • There are two implementations for providing caching services in the form of clusters;
      1. It is necessary to update the synchronized distributed cache. All servers store the same cached data. The problem is that the amount of cached data is limited. Secondly, the data must be synchronized on all machines, which is very expensive.
      1. Each machine only caches part of the data, and then selects the cache server through a certain algorithm. The common remainder hash algorithm has the problem of rebuilding a large amount of cached data when a server goes offline. So a consistent hashing algorithm is proposed.
      • consistent hashing
      1. First calculate the hash value of the server (node), and configure it on the continuum of 0-232.
      1. Then use the same method to find the hash value of the key to store the data, and map to the same circle.
      2. It then looks clockwise from where the data is mapped to, saving the data to the first server it finds. If the server cannot be found after exceeding 232, it will be saved to the first server.
        The consistent hash algorithm only needs to relocate a small part of the data in the ring space for the increase or decrease of nodes, which has good fault tolerance and scalability.
        When the consistent hash algorithm has too few service nodes, it is easy to cause data skew due to uneven node division. At this time, a large amount of data will be concentrated on Node A, and only a small amount will be located on Node B. In order to solve this data skew problem, the consistent hash algorithm introduces a virtual node mechanism, that is, calculates multiple hashes for each service node, and places a service node at each calculation result position, which is called a virtual node. The specific method can be realized by adding a number after the server ip or host name. For example, three virtual nodes can be calculated for each server, so "Node A#1", "Node A#2", "Node A#3", "Node B#1", "Node B#2 ", "Node B#3" hash value, thus forming six virtual nodes: at the same time, the data positioning algorithm remains unchanged, but there is an extra step of mapping from virtual nodes to actual nodes, such as positioning "Node A#1", " The data of the three virtual nodes "Node A#2" and "Node A#3" are located on Node A. This solves the problem of data skew when there are few service nodes. In practical applications, the number of virtual nodes is usually set to 32 or more, so even a few service nodes can achieve relatively uniform data distribution.
  • cluster

  • asynchronous

    • Synchronous and asynchronous, blocking and non-blocking

      Synchronous and asynchronous focus on the communication mechanism of the result message

      • Synchronize

      Synchronization means that the caller needs to actively wait for the result to be returned

      • asynchronous

      Asynchronous means that there is no need to actively wait for the return of the result, but through other means such as status notification, callback function, etc.

      Blocking and non-blocking are primarily concerned with the state of waiting for results to return to the caller

      • block

      It means that before the result is returned, the current thread is suspended and does not do anything

      • non-blocking

      It means that before the result is returned, the thread can do some other things without being suspended.

      • synchronous blocking

      Synchronous blocking is basically the most common model in programming. For example, if you go to a store to buy clothes, and you find that the clothes are sold out after you go there, then you wait in the store without doing anything (including looking at your phone), etc. It is very inefficient to follow merchants to purchase goods until they are in stock. BIO in jdk belongs to synchronous blocking

      • synchronous non-blocking

      Synchronous non-blocking can be abstracted as a polling mode in programming. After you go to the store, you find that the clothes are sold out. I still need to go to the store from time to time to ask the boss if the new clothes have arrived. NIO in jdk is synchronous and non-blocking

      • asynchronous blocking

      Asynchronous blocking is rarely used in programming. It is a bit like you write a thread pool, submit and then future.get() immediately, so that the thread is actually still suspended. It’s a bit like when you go to the store to buy clothes, and at this time you find that the clothes are gone, you leave a phone number for the boss at this time, saying that you will call me when the clothes arrive, and then you just guard the phone and wait for him to ring. don't do it. This does feel a bit silly, so this mode is used less.

      • asynchronous non-blocking

      It's like when you go to the store to buy clothes, and the clothes are gone, you just need to tell the boss that this is my phone number, and call when the clothes arrive. Then you can play as you like, and you don't have to worry about when the clothes will arrive. Once the clothes arrive, you can go shopping for clothes as soon as the phone rings. AIO in jdk is asynchronous

    • common asynchronous means

      • Servlet asynchronous

      It is only available in servlet3, and the supported web containers are after tomcat7 and jetty8.

      • Multithreading
      • message queue
      • cluster

      It can well allocate user requests to multiple machines for processing, which greatly improves the overall performance

      • program code level

      An application's performance ultimately depends on how the code is written.

  • application related

    • code level

      An application's performance ultimately depends on how the code is written.

      • Choose the right data structure

      Choosing ArrayList and LinkedList has a great impact on our program performance, why? Because ArrayList is internally implemented as an array, there is constant expansion and data replication.

      • Choose a better algorithm

      For example, the maximum sub-column sum problem:
      Given an integer sequence, a0, a1, a2, ... , an (item can be negative), find the largest sub-sequence sum.
      If all integers are negative, then the maximum subsequence sum is 0;
      for example (a[1],a[2],a[3],a[4],a[5],a[6])=(- 2,11,-4,13,-5,-2),
      the maximum subsection sum is 20, and the subsections are a[2], a[3], a[4].
      The worst algorithm: the exhaustive method, the required calculation time is O(n^3). The
      general algorithm: the calculation time complexity of the divide and conquer method is O(nlogn).
      The best algorithm: the maximum sub-segment sum The dynamic programming algorithm, the calculation time complexity is O(n)
      The larger the n, the greater the time difference, such as 10,000 elements, the gap between the worst algorithm and the best algorithm is by no means multi-threading or clustering performance Easy fix.

      • write less code

      The same correct program, small programs are faster than large programs, this has nothing to do with the programming language.

    • concurrent programming

      1. Make full use of CPU multi-core,
      1. Implement thread-safe classes to avoid thread-safety issues
      2. Reduce lock contention under synchronization
    • Resource reuse

      The purpose is to reduce the creation and destruction of expensive system resources, such as database connections, network communication connections, thread resources, and so on.

    • JVM

      • Optimizations related to the JIT compiler

        • The concept of hot compilation

        For a program, usually only a part of the code is executed frequently, and these key codes are called hot spots of the application, and the more executed, the hotter it is. Compiling these codes into local machine-specific binary codes can effectively improve application performance.

        • Select compiler type

        -server, compile later, but more optimized after compilation, higher performance
        -client, start compiling very early

        • Code cache related

        After compilation, there will be a code cache to save the compiled code. Once the cache is full, jvm will not be able to continue compiling the code.
        When jvm prompts: CodeCache is full, it means that the code cache size needs to be increased.
        –XX:ReservedCodeCacheSize=N can be used to adjust this size.

        • compile threshold

        Whether the code is compiled depends on the frequency of code execution and whether the compilation threshold is reached.
        There are two types of counters: the method call counter and the loop back edge counter in the method.
        Whether a method reaches the compilation threshold depends on the sum of the two counters in the method. The parameters for compiling threshold adjustment are: -XX:CompileThreshold=N
        The method call counter counts not the absolute number of times a method is called, but a relative execution frequency, that is, the number of times a method is called within a period of time. When a certain time limit is exceeded, if the number of method calls is still not enough to submit it to the just-in-time compiler for compilation, the call counter of this method will be reduced by half. This process is called method call counter heat decay (Counter Decay) , and this period of time is called the half-life period of this method (Counter Half Life Time). The action of heat decay is performed by the way when the virtual machine performs garbage collection. You can use the virtual machine parameter -XX:-UseCounterDecay to turn off heat decay and let the method counter count the absolute number of method calls. In this way, as long as the system runs long enough , most of the methods will be compiled into native code. In addition, you can use the -XX:CounterHalfLifeTime parameter to set the time of the half-life cycle in seconds.
        Unlike the method counter, the return edge counter does not count the process of heat decay, so this counter counts the absolute number of times the method is looped.

        • compile thread

        When compiling the code, it is compiled with multiple threads.

        • method inlining

        Inline is enabled by default, -XX:-Inline, you can turn it off, but don't turn it off, once it is turned off, it will have a huge impact on performance.
        Whether the method is inlined depends on how hot the method is and the size of the method.
        If the method bytecode of a very hot method is less than 325 bytes, it will be inlined. This size is adjusted by the parameter -XX:MaxFreqInlinesSzie=N, but this is very hot and Unlike hotspot compilation, there are no parameters to adjust the heat.
        If the method is less than 35 bytecodes, it must be inlined. This size can be adjusted by the parameter -XX:MaxInlinesSzie=N.

        • escape analysis

        It is the most aggressive optimization made by the JVM, and it is best not to adjust related parameters.

      • GC tuning

        • Purpose

        The time of GC is small enough,
        the frequency of GC
        is small enough, and the cycle of Full GC is long enough and the time is reasonable. It is best not to happen.

        • Tuning principles and steps
        1. Most java applications do not require GC tuning
        1. Most of what needs GC tuning is not a parameter problem, but a code problem
        2. In actual use, analyzing the GC situation to optimize the code is much more than optimizing the GC parameters;
        3. GC tuning is the last resort. There are
          three most important options for GC tuning:
          first: choose an appropriate GC recycler
          second: choose an appropriate heap size
          third: choose the proportion of the young generation in the heap

        ###Step
        1. Monitor the status of GC

        Use various JVM tools to view the current log, analyze the current JVM parameter settings, and analyze the current heap memory snapshot and gc log. According to the actual memory division of each area and GC execution time, think whether to optimize;

        2. Analyze the results and judge whether optimization is needed

        If the parameters are set reasonably, there is no timeout log in the system, the GC frequency is not high, and the GC time is not high, then there is no need to perform GC optimization; if the GC time exceeds 1-3 seconds, or frequent GC, it must be optimized; Note
        : If the following indicators are met, there is generally no need to perform GC:
        Minor GC execution time is less than 50ms;
        Minor GC execution is infrequent, about once every 10 seconds;
        Full GC execution time is less than 1s;
        Full GC execution frequency is not too frequent, not low 1 time in 10 minutes;

        3. Adjust GC type and memory allocation

        If the memory allocation is too large or too small, or the GC collector used is relatively slow, you should first adjust these parameters, and first find one or several machines for beta, and then compare the performance of the optimized machine and the unoptimized machine Compare and make a targeted final choice;

        4. Continuous analysis and adjustment

        Through continuous trial and error, analyze and find the most suitable parameters

        5. Comprehensive application parameters

        If the most suitable parameters are found, they are applied to all servers and followed up.

      • JVM tuning in practice

        ### Recommended strategy

        • young generation size selection
        1. Application of response time priority: set it as large as possible until it is close to the minimum response time limit of the system (selected according to the actual situation). In this case, the frequency of collection of the young generation is also the smallest. At the same time, reduce the time to reach the old generation object.
        1. Applications with throughput priority: Set it as large as possible, possibly reaching Gbit. Because there is no requirement for response time, garbage collection can be performed in parallel, which is generally suitable for applications with more than 8 CPUs.
        2. Avoid setting too small. When the new generation setting is too small, it will lead to: 1. The number of YGCs will be more frequent. 2. It may cause YGC objects to directly enter the old generation. If the old generation is full at this time, FGC will be triggered.
        • Old Generation Size Selection
        1. Applications with priority on response time: The old generation uses concurrent collectors, so its size needs to be set carefully. Generally, some parameters such as the concurrent session rate and session duration should be considered. If the heap setting is small, it may cause memory fragmentation and high recycling frequency And the application pauses and uses the traditional mark clearing method; if the heap is large, it takes a long time to collect. The optimal solution generally needs to be obtained by referring to the following data:
        1. Concurrent garbage collection information, the number of concurrent permanent generation collections, traditional GC information, and the proportion of time spent on young generation and old generation collection.
        2. Applications that prioritize throughput Generally, applications that prioritize throughput have a large young generation and a small old generation. The reason is that this can recycle most of the short-term objects as much as possible, reducing mid-term objects, while the old Dedicated storage of long-lived objects
    • Storage Performance Optimization

      1. Try to use SSD
      1. Regularly clean up the data or store it separately according to the nature of the data

write at the end

insert image description here

The content has been shared with everyone, and you are welcome to continue to pay attention. I will continue to work hard to share more dry goods for you.
If you have any questions, you are welcome to correct me. I
have prepared a mind map version for everyone, and the small card at the end of the article

Guess you like

Origin blog.csdn.net/sulli_F/article/details/130545179