Interview Series Two: Selected JVM Special Questions for Big Data Interview-Detailed Analysis with Answers

The official account (Five Minutes to Learn Big Data) has launched a series of big data interviews—a five-minute small interview . This series of articles will deeply study the real interview questions of major factories , and expand relevant knowledge points based on the interview questions to help everyone Be able to successfully join a big factory!

Big data interview series articles are divided into two types: mixed type (that is, there will be knowledge points of multiple frameworks in an article—integration); special type (an article conducts in-depth analysis of a certain framework—special exercise).

This article is the second in a series (JVM special)

The first question: JVM memory related (Baidu)

Question: Do you understand the JVM memory model? Let's just talk about it

answer:

Because this piece of content is too much, many friends may not remember so much, so the answers below are divided into short answers and refined answers .

JVM runtime memory is divided into five parts: program counter, Java virtual machine stack, local method stack, heap, and method area :

Note: JVM tuning is mainly to optimize the Heap heap and Method Area method area

  1. Program counter (thread private):

Short answer : Each thread has a program calculator, which is a pointer to the method bytecode (the next instruction code to be executed) in the method area. The execution engine reads the next instruction, which is a very small The memory space can almost be ignored.

Fine answer : Occupies a small memory space, which can be seen as a line number indicator of the bytecode executed by the current thread. In the virtual machine conceptual model, when the bytecode interpreter works, it selects the next bytecode instruction to be executed by changing the value of this counter. Basic functions such as branch, loop, jump, exception handling, and thread recovery are all required. Rely on this counter to complete.

Since the multi-threading of JVM is realized by the way of thread switching and allocating processor execution time in turn, at any certain moment, a processor will only execute instructions in one thread. Therefore, in the future, after the thread is switched, it can be restored to the correct execution position. Each thread needs to have an independent program counter. The counters between the threads do not affect each other and are stored independently. We call this type of memory area "thread private" RAM.

If the thread is executing a Java method, this counter records the address of the bytecode instruction of the virtual machine being executed;

If the native method is being executed, this counter is empty (undefined).

This memory area is the only area that does not specify any OutOfMemoryError conditions in the Java Virtual Machine Specification .

  1. Java virtual machine stack (thread private):

Short answer : Supervising the running of the Java program, it is created when the thread is created. Its lifetime follows the lifetime of the thread, and the stack memory is released when the thread ends. There is no garbage collection problem for the stack, as long as the thread ends the stack As for Over, the life cycle is the same as that of the thread and is private to the thread. Basic types of variables and object reference variables are allocated in the stack memory of the function.

Answer : The thread is private, and the life cycle is the same as the thread. The virtual machine stack describes the memory model of Java method execution. When each method is executed, a stack frame is created to store the local variable table, operand stack, and dynamic link. , Method of export and other information. The process from invocation to completion of each method corresponds to the process of pushing a stack frame in the virtual machine stack to popping out of the stack.

The local variable table stores various basic types of data known at compile time (boolean, byte, char, short, int, float, long, double), object references, and returnAddress types (pointing to the address of a bytecode instruction).

The 64-bit long and double data will occupy two local variable table spaces (slot), and the remaining data types will only occupy one. The memory space required by the local variable table is allocated at compile time. When entering a method, how much local variable space this method needs to allocate in the stack frame is completely determined, and the local variable table will not be changed during the running of the method. size.

In the Java virtual machine specification, there are two exceptions for this area: If the stack depth requested by the thread is greater than the depth allowed by the virtual machine, a Stack OverflowError exception will be thrown; if the virtual machine stack can be dynamically expanded, it cannot be applied. If there is enough memory, OutOfMemoryError will be thrown.

  1. Local method stack (thread private):

Short answer : The native method stack serves the native methods used in the virtual machine. The role of the native method is to integrate different programming languages ​​for Java. Its original intention is to integrate C/C++ programs. When Java was born, C/C++ was rampant. , If you want to get a foothold, you must have a C/C++ program called, so a special area is opened in the memory to process the code marked as native.

Answer : The functions of the local method stack and the virtual machine stack are very similar. The difference between them is that the virtual machine stack executes Java methods (bytecode) services for the virtual machine, while the local method stack is used in the virtual machine. To the native method service. In the virtual machine specification, the language, usage, and data structure used by the methods in the local method stack are not mandatory, so the specific virtual machine can freely implement it. Even some virtual machines directly combine the local method stack and the virtual machine stack into one, and like the virtual machine stack, Stack OverflowError and OutOfMemoryError exceptions will also be thrown.

  1. Java heap (thread sharing):

Short answer : The heap area is the largest in the JVM. Application objects and data exist in this area. This area is also shared by threads. It is also the main recovery area of ​​gc. There is only one heap storage for a JVM instance. The size of the heap memory can be adjusted.

Answer : For most applications, the heap space is the largest piece of jvm memory. The Java heap is shared by all threads and is created when the virtual machine starts. The only purpose of this memory area is to store object instances. Almost all object instances allocate memory here. This is described in the Java virtual machine specification: all object instances and arrays must be allocated on the heap, but with the development of the JIT compiler and the gradual maturity of escape analysis techniques, allocation on the stack, scalar replacement optimization technology will As a result of some subtle changes, all objects are allocated on the heap and it becomes less absolute.

The Java heap is the main area managed by the garbage collector, so it is often referred to as the "GC heap". From the perspective of memory recovery, since collectors now basically use generational collection algorithms, Java heaps can also be subdivided into: new generation and old generation; more detailed ones include Eden space, From Survivor space, To Survivor space, etc. From the perspective of memory allocation, the Java heap shared by threads may be divided into multiple thread-private allocation buffers. However, no matter what the division is, it has nothing to do with the storage content. No matter which area, the storage is still the object instance. The purpose of further division is to better reclaim memory or allocate memory faster. (If there is no memory in the heap to complete the instance allocation, and the heap can no longer be expanded, an OutOfMemoryError exception will be thrown.)

  1. Method area (thread sharing) :

Short answer : All threads are shared like the heap, mainly used to store data such as class information, constants, static variables, and code compiled by the JVM that have been loaded by the jvm.

Answer : The method area is shared by all threads. All fields and method bytecodes, as well as some special methods such as constructors, interface codes are also defined here. Simply put, all the defined method information is stored in this area, which belongs to the shared interval.

Static variables, constants, class information (construction methods/interface definitions), and runtime constant pools are stored in the method area; but instance variables are stored in the heap memory and have nothing to do with the method area.

In HotSpot released in JDK1.7, the string constant pool has been removed from the method area .

  1. Constant pool (thread sharing) ::

Short answer : The runtime constant pool is part of the method area. It is used to store various literals and symbol references generated during compilation. Its important feature is dynamic. That is, the Java language does not require constants to be generated only during compilation. New constants may also be generated during runtime. These constants are Put it in the runtime constant pool.

Fine answer : The runtime constant pool is part of the method area. In addition to the description information of the class version, fields, methods, and interfaces in the Class file, there is also a constant pool, which is used to store various literals and symbol references generated during compilation. This part of the content will be after the class is loaded It is stored in the runtime constant pool in the method area.

The Java virtual machine has strict regulations on the format of each part of the class file, and what kind of data each byte is used to store must conform to the specification before it can be recognized by the jvm. But for the runtime constant pool, the Java virtual machine specification does not make any detailed requirements.

An important feature of the runtime constant pool is dynamic. The Java language does not require constants to be generated only during compilation, that is, the content of the constant pool in the class file is not preset to enter the runtime constant pool of the method area. During runtime It is also possible to put new constants into the pool. This feature is most commonly used in the intern() method of the String class.

Since the runtime constant pool is part of the method area, it is naturally limited by the memory of the method area. When the constant pool can no longer apply for memory, an outOfMemeryError exception will be thrown .

Compared with jdk 1.7, the biggest difference between jdk 1.8 is that the metadata area replaces the permanent generation . The nature of the metaspace is similar to the permanent generation, and both are the realization of the method area in the JVM specification. However, the biggest difference between meta space and permanent generation is that the meta data space is not in the virtual machine, but uses local memory .

Question 2: Related to class loading (Sina Weibo)

Question: What are the main processes of JVM loading classes, and how to load them?

answer:

Short answer : The class loading process refers to the process in which the JVM virtual machine loads the class information in the .class file into the memory, and parses it to generate the corresponding class object. It is divided into five steps: loading -> verification -> preparation -> parsing -> initialization. Loading : load the external .class file into the Java virtual machine; verification : ensure that the information contained in the loaded calss file meets the requirements of the Java virtual machine; preparation : allocate memory for class variables and set the initial value of class variables; analysis : Convert symbol references in the constant pool to direct references; Initialization : initialize class variables and static code blocks.

Jing Answer : Early warning, longer content, be prepared !

A Java file from the completion of the encoding to the final execution, generally mainly includes two processes: compile and run

  • Compilation: The java file we have written is compiled into bytecode through the javac command, which is what we often call the .class file.

  • Run: The .class file generated by the compilation is handed over to the Java Virtual Machine (JVM) for execution.

What we call the class loading process refers to the process in which the JVM virtual machine loads the class information in the .class file into the memory, and parses it to generate the corresponding class object.

  • Class loading process

For a simple example, when the JVM is executing a certain piece of code, it encounters class A, but there is no information about class A in the memory at this time, so the JVM will look for the class of class A in the corresponding class file Information, and loaded into memory, this is what we call the class loading process.
It can be seen that the JVM does not load all classes into memory at the beginning, but only loads it when it encounters a certain class that needs to be run for the first time, and only loads it once.

  • Class loading

The process of class loading is mainly divided into three parts: loading, linking, and initialization .

The link can be subdivided into three small parts: verification, preparation, and analysis .

  • load

Simply put, loading refers to loading class bytecode files from various sources into memory through a class loader.

There are two important points here:

Bytecode source: general loading sources include .class files compiled from a local path, .class files in a jar package, real-time compilation from a remote network, and dynamic proxy

Class loader: generally includes startup class loader, extended class loader, application class loader, and user-defined class loader.

Note: Why is there a custom class loader?
On the one hand, because java code is easy to be decompiled, if you need to encrypt your own code, you can encrypt the compiled code, and then decrypt it by implementing your own custom class loader, and finally load it.
On the other hand, it is also possible to load code from non-standard sources, such as from a network source, you need to implement a class loader yourself to load from a specified source.

  • verification

The main purpose is to ensure that the loaded byte stream complies with the virtual machine specifications and will not cause security errors.

Including the verification of the file format, such as whether there are unsupported constants in the constants? Is there any irregular or additional information in the file?

For metadata verification, such as whether the class inherits the final modified class? Do the fields and methods in the class conflict with the parent class? Is there an unreasonable overload?

For bytecode verification, ensure the rationality of program semantics, such as ensuring the rationality of type conversion.

For the verification of symbol references, such as verifying whether the corresponding class can be found through the fully qualified name in the symbol reference? Check whether the accessibility (private, public, etc.) in the symbol reference can be accessed by the current class?

  • ready

Mainly allocate memory for class variables (note, not instance variables) and assign initial values.

Special attention should be paid to the initial value, not the initialized value specifically written in the code, but the default initial value of the Java virtual machine according to different variable types.

For example, the initial value of the 8 basic types is 0 by default; the initial value of the reference type is null; the initial value of the constant is the value set in the code, final
static tmp = 456, then the initial value of tmp at this stage is 456.

  • Parsing

The process of replacing symbol references in the constant pool with direct references.

Two important points:

Symbol reference: a string, but this string gives some information that can uniquely identify a method, a variable, and a class.

Direct reference: can be understood as a memory address, or an offset. For example, for class methods, direct references to class variables are pointers to the method area; for instance methods, direct references to instance variables are the offset from the head pointer of the instance to the position of the instance variable.

For example, now call the method hello(), the address of this method is 1234567, then hello is a symbolic reference, and 1234567 is a direct reference.

In the parsing phase, the virtual machine replaces all symbolic references such as class names, method names, and field names with specific memory addresses or offsets, that is, direct references.

  • initialization

This stage is mainly to initialize the class variable, which is the process of executing the class constructor.  
In other words, only initialize variables or statements modified by static.  
If the parent class has not been initialized when a class is initialized, the parent class will be initialized first.    
If multiple static variables and static code blocks are included at the same time, they are executed in order from top to bottom.

  • to sum up

The class loading process is only a part of the life cycle of a class. Before it, there is a compilation process. Only after the source code is compiled, can the bytecode file that can be loaded by the virtual machine be obtained; after that, there is a specific class usage process. , After the use is completed, it will be uninstalled during the garbage collection process in the method area. If you want to understand the entire life cycle of a Java class, you can check the relevant information online, so I won't repeat it here.

The third question: JVM memory related (Yuncong Technology)

Question: Will there be a memory leak in Java, please describe briefly

answer:

In theory, Java has no memory leak problem because of the garbage collection mechanism (GC) (this is also an important reason why Java is widely used in server-side programming); however, in actual development, there may be useless but reachable objects. These objects cannot be reclaimed by the GC and memory leaks occur .

An example is that the objects in Hibernate's Session (level one cache) are in a persistent state, and the garbage collector will not reclaim these objects. However, there may be useless garbage objects in these objects.

The following example also shows a memory leak in Java:

package com.yuan_more;

import java.util.Arrays;
import java.util.EmptyStackException;

public class MyStack<T{
    private  T[] elements;
    private int size = 0;

    private static final int INIT_CAPACITY = 16;

    public MyStack(){
        elements = (T[]) new Object[INIT_CAPACITY];
    }

    public void push(T elem){
        ensureCapacity();
    }

    public T pop(){
        if(size == 0){
            throw new EmptyStackException();
        }
        return elements[-- size];
    }

    private void ensureCapacity() {
        if(elements.length == size){
            elements = Arrays.copyOf(elements,2 * size +1);
        }
    }
}

The above code implements a stack (FILO) structure. At first glance, there seems to be no obvious problem. It can even pass the various unit tests you write.

However, the pop method has a memory leak problem. When we pop an object in the stack with the pop method, the object will not be treated as garbage, even if the program using the stack no longer references these objects, because the stack is maintained internally Obsolete references to these objects.

In languages ​​that support garbage collection, memory leaks are very hidden. Such memory leaks are actually unconscious object retention.

If an object reference is unconsciously reserved, then the garbage collector will not process this object, nor will it process other objects referenced by the object, even if there are only a few such objects, it may cause many objects to be excluded In addition to garbage collection, it has a significant impact on performance. In extreme cases, Disk Paging (physical memory and virtual memory of the hard disk exchange data) will be triggered, and even OutOfMemoryError will be caused.

Fourth question: Garbage collection related (Didi Travel)

Q: Do you know GC? Why is there a GC?

answer:

GC means garbage collection. Memory processing is a place where programmers are prone to problems. Forgetting or wrong memory collection can cause instability or even crash of the program or system.

The GC function provided by Java can automatically monitor whether the object exceeds the scope to achieve the purpose of automatic memory recovery. The Java language does not provide a display operation method for releasing allocated memory. Java programmers don't need to worry about memory management, because the garbage collector will manage it automatically .

要请求垃圾收集,可以调用下面的方法之一:System.gc() 或Runtime.getRuntime().gc() ,注意,只是请求,JVM何时进行垃圾回收具有不可预知性

垃圾回收可以有效的防止内存泄露,有效的使用可以使用的内存。垃圾回收器通常是作为一个单独的低优先级的线程运行,不可预知的情况下对内存堆中已经死亡的或者长时间没有使用的对象进行清除和回收,程序员不能实时的调用垃圾回收器对某个对象或所有对象进行垃圾回收。

在Java诞生初期,垃圾回收是Java最大的亮点之一,因为服务器端的编程需要有效的防止内存泄露问题,然而时过境迁,如今Java的垃圾回收机制已经成为被诟病的东西。移动智能终端用户通常觉得iOS的系统比Android系统有更好的用户体验,其中一个深层次的原因就在于Android系统中垃圾回收的不可预知性。

第五题:JVM内存相关(阿里)

问:Hotspot虚拟机中的堆为什么要有新生代和老年代?

答:

因为有的对象寿命长,有的对象寿命短。应该将寿命长的对象放在一个区,寿命短的对象放在一个区。不同的区采用不同的垃圾收集算法。寿命短的区清理频次高一点,寿命长的区清理频次低一点,提高效率。

所谓的新生代和老年代是针对于分代收集算法来定义的,新生代又分为Eden和Survivor两个区。加上老年代就这三个区

数据会首先分配到Eden区当中,当然也有特殊情况,如果是大对象那么会直接放入到老年代(大对象是指需要大量连续内存空间的java对象)。当Eden没有足够空间的时候就会触发jvm发起一次Minor GC。新生代垃圾回收采用的是复制算法

如果对象经过一次Minor GC还存活,并且又能被Survivor空间接受,那么将被移动到Survivor空间当中。并将其年龄设为1,对象在Survivor每熬过一次Minor GC,年龄就加1,当年龄达到一定的程度(默认为15)时,就会被晋升到老年代中了,当然晋升老年代的年龄是可以设置的。如果老年代满了就执行:Full GC, 因为不经常执行,因此老年代垃圾回收采用了标记-整理(Mark-Compact)算法


Guess you like

Origin blog.51cto.com/14932245/2642839