Problem Description
In the last blog post , I submitted a general Mapreduce through a Java program . During the implementation process, I found that each time a Mapreduce task was submitted, the JVM could not recycle the MapReduceClassLoader object and the generated classes generated during the process.
Implement multiple task submission tests by customizing the following code:
public class JobSubmitTest {
public static void submit(String classPath, String mainClassName) {
ClassLoader originCL = Thread.currentThread().getContextClassLoader();
try {
MapReduceClassLoader cl = new MapReduceClassLoader();
cl.addClassPath(classPath);
System.out.println("URLS:" + Arrays.toString(cl.getURLs()));
Thread.currentThread().setContextClassLoader(cl);
Class mainClass = cl.loadClass(mainClassName);
System.out.println(mainClass.getClassLoader());
Method mainMethod = mainClass.getMethod("main", new Class[] { String[].class });
mainMethod.invoke(null, new Object[] {new String[0]});
Class jobClass = cl.loadClass("org.apache.hadoop.mapreduce.Job");
System.out.println(jobClass.getClassLoader());
Field field = jobClass.getField(JobAdapter.JOB_FIELD_NAME);
System.out.println(field.get(null));
} catch (Exception e) {
e.printStackTrace();
} finally {
Thread.currentThread().setContextClassLoader(originCL);
}
}
public static void main(String[] args) {
String classPath = args[0];
String mainClassName = args[1];
Scanner scanner = new Scanner(System.in);
String cmd = null;
int i = 0;
while (true) {
cmd = scanner.next();
if ("exit".equalsIgnoreCase(cmd)) {
break;
}
submit(classPath, mainClassName);
i++;
System.out.println("submit index = " + i);
}
}
}
Excuting an order:java -XX:PermSize=50M -XX:MaxPermSize=50M -Dhadoop.home.dir=$HADOOP_HOME -Djava.library.path=$HADOOP_HOME/lib/native \ -classpath $CLASSPATH JobSubmitTest $MR_CLASSPATH $MR_MAIN_CLASS
After executing the command, enter "1" + enter 3 times to implement mapreduce submission 3 times, and create an independent class loader to load hadoop-related classes.
By looking at the change in usage of the permanent generation:
$ jstat -gcutil 21225 1000 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 0.00 6.05 0.00 6.63 0 0.000 0 0.000 0.000
0.00 0.00 26.15 0.00 8.07 0 0.000 0 0.000 0.000
0.00 0.00 76.55 0.00 16.33 0 0.000 0 0.000 0.000
0.00 78.52 19.82 0.10 28.19 3 0.023 0 0.000 0.023
97.58 0.00 30.21 0.11 36.39 4 0.033 0 0.000 0.033
97.58 0.00 34.18 0.11 36.46 4 0.033 0 0.000 0.033
0.00 99.95 96.01 5.21 52.10 6 0.050 0 0.000 0.050
95.45 0.00 25.96 5.22 57.08 6 0.065 0 0.000 0.065
95.45 0.00 69.92 5.22 65.57 6 0.065 0 0.000 0.065
0.00 99.93 37.95 10.91 77.75 7 0.098 0 0.000 0.098
0.00 99.93 37.95 10.91 77.75 7 0.098 0 0.000 0.098
0.00 99.93 37.95 10.91 77.75 7 0.098 0 0.000 0.098
0.00 99.93 37.95 10.91 77.75 7 0.098 0 0.000 0.098
The P column represents the usage ratio of the permanent generation;
Execute GC to see if the permanent generation will become smaller: Execute jcmd $PID GC.run
:
$ jstat -gcutil 21225 1000 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 99.93 41.48 10.91 77.75 7 0.098 0 0.000 0.098
0.00 99.93 41.48 10.91 77.75 7 0.098 0 0.000 0.098
0.00 0.00 0.00 10.62 77.68 8 0.116 1 0.209 0.325
0.00 0.00 0.00 10.62 77.68 8 0.116 1 0.209 0.325
It can be seen that the permanent generation has hardly changed, and the permanent generation has not been recycled.
$ jmap -permstat 21225
Attaching to process ID 21225, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.65-b04
finding class loader instances ..done.
computing per loader stat ..done.
please wait.. computing liveness......................................................done.
class_loader classes bytes parent_loader alive? type
<bootstrap> 1301 7691864 null live <internal>
0x0000000085247020 1 1888 null dead sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x000000008527f2b0 1744 12519760 0x0000000085031cb0 live com/spiro/test/mr/MapReduceClassLoader@0x0000000082096e50
0x0000000085018b98 1757 12584416 0x0000000085031cb0 live com/spiro/test/mr/MapReduceClassLoader@0x0000000082096e50
0x0000000085128c80 0 0 0x0000000085031cb0 live java/util/ResourceBundle$RBClassLoader@0x00000000820f5030
0x0000000085021cc0 1 3032 null dead sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x00000000852a6f50 1 3056 null dead sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x0000000085031cb0 83 873544 0x0000000085031d00 live sun/misc/Launcher$AppClassLoader@0x0000000082013318
0x0000000085021c80 1 1888 null dead sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x00000000852a6f90 1 3056 null dead sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x0000000085021c40 1 3056 0x0000000085018b98 dead sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x00000000852a6fd0 1 3056 0x00000000852a67e0 dead sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x0000000085031d00 0 0 null live sun/misc/Launcher$ExtClassLoader@0x0000000081fb5c08
0x0000000085021c00 1 3032 null dead sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x00000000852a6e48 1 3056 0x000000008527f2b0 dead sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x00000000852a67e0 1744 12519760 0x0000000085031cb0 live com/spiro/test/mr/MapReduceClassLoader@0x0000000082096e50
total = 16 6638 46214464 N/A alive=7, dead=9 N/A
$ jcmd 21225 GC.class_histogram | grep MapReduceClassLoader
264: 3 240 com.spiro.test.mr.MapReduceClassLoader
num #instances #bytes class name
$ jcmd 21225 GC.class_histogram | grep org.apache.hadoop.mapreduce.Job
772: 2 48 org.apache.hadoop.mapreduce.Job$JobState
785: 1 48 org.apache.hadoop.mapreduce.Job
813: 1 48 org.apache.hadoop.mapreduce.Job
878: 2 48 org.apache.hadoop.mapreduce.Job$JobState
883: 1 48 org.apache.hadoop.mapreduce.Job
961: 2 48 org.apache.hadoop.mapreduce.Job$JobState
1357: 1 24 [Lorg.apache.hadoop.mapreduce.Job$JobState;
1511: 1 24 [Lorg.apache.hadoop.mapreduce.Job$JobState;
1601: 1 24 [Lorg.apache.hadoop.mapreduce.Job$JobState;
It can be seen that there are 3 class loaders of the MapReduceClassLoader type, and they occupy most of the capacity. There are three org.apache.hadoop.mapreduce.Job objects. Although they have the same name, they are different classes and are loaded by three class loaders.
Analyze the reasons
From the code point of view, MapReduceClassLoader cl = new MapReduceClassLoader(); is defined in the method body. When the method ends, the local variable table in the stack frame disappears, the MapReduceClassLoader object should be GC, and all the loaded by it Classes should also be recycled. But why is it not recycled? According to the root search algorithm (GC Roots Tracing) that determines whether the object is alive or not, there must be the following GC roots that still hold the MapReduceClassLoader object:
- Reference objects in the virtual machine stack (local variable table in the stack frame)
- Static properties in the method area;
- Constant references in the method area;
- The reference object of JNI in the native method stack;
The following is an analysis of the dump file made by java. Execute export dump filejmap -dump:live,format=b,file=heap.bin $PID
Analyzed by the jvisualvm tool
Find the MapReduceClassLoader class in the Classes tab, right-click and select "show in instances view",
Right-click on "this" in the following References and select "show nearest gc root",
You can see that there is a thread object named "Thread-2" whose contextClassLoader attribute references to the MapReduceClassLoader object. As a result, the MapReduceClassLoader object cannot be recycled.
The thread information can be seen in the Summary tab page. One of the thread call stacks named "Thread-2" is in the org.apache.hadoop.net.unix.DomainSocketWatcher class. Through source code analysis, this thread is submitting MR during execution. The child thread started by the hadoop framework during the task process will use the contextClassLoad of the parent thread as its contextClassLoad when creating the child thread.
So far, the problem analysis is over.
Summarize
The cause of the problem is that an operation was performed before submitting the MR task Thread.currentThread().setContextClassLoader(cl);
, and a resident child thread started by hadoop during the submission process uses the contextClassLoad of its parent thread as its context thread, that is, MapReduceClassLoader.