Submitting the problem that generic Mapreduce cannot recycle classes through Java programs

Problem Description

In the last blog post , I submitted a general Mapreduce through a Java program . During the implementation process, I found that each time a Mapreduce task was submitted, the JVM could not recycle the MapReduceClassLoader object and the generated classes generated during the process.

Implement multiple task submission tests by customizing the following code:

public class JobSubmitTest {

    public static void submit(String classPath, String mainClassName) {
        ClassLoader originCL = Thread.currentThread().getContextClassLoader();
        try {
            MapReduceClassLoader cl = new MapReduceClassLoader();
            cl.addClassPath(classPath);

            System.out.println("URLS:" + Arrays.toString(cl.getURLs()));

            Thread.currentThread().setContextClassLoader(cl);

            Class mainClass = cl.loadClass(mainClassName);
            System.out.println(mainClass.getClassLoader());
            Method mainMethod = mainClass.getMethod("main", new Class[] { String[].class });
            mainMethod.invoke(null, new Object[] {new String[0]});

            Class jobClass = cl.loadClass("org.apache.hadoop.mapreduce.Job");
            System.out.println(jobClass.getClassLoader());
            Field field = jobClass.getField(JobAdapter.JOB_FIELD_NAME);
            System.out.println(field.get(null));
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            Thread.currentThread().setContextClassLoader(originCL);
        }
    }

    public static void main(String[] args) {

        String classPath = args[0];
        String mainClassName = args[1];

        Scanner scanner = new Scanner(System.in);
        String cmd = null;
        int i = 0;
        while (true) {
            cmd = scanner.next();
            if ("exit".equalsIgnoreCase(cmd)) {
                break;
            }
            submit(classPath, mainClassName);
            i++;
            System.out.println("submit index = " + i);
        }
    }
}

Excuting an order:java -XX:PermSize=50M -XX:MaxPermSize=50M -Dhadoop.home.dir=$HADOOP_HOME -Djava.library.path=$HADOOP_HOME/lib/native \ -classpath $CLASSPATH JobSubmitTest $MR_CLASSPATH $MR_MAIN_CLASS

After executing the command, enter "1" + enter 3 times to implement mapreduce submission 3 times, and create an independent class loader to load hadoop-related classes.

By looking at the change in usage of the permanent generation:

$ jstat -gcutil 21225 1000 1000
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT   
  0.00   0.00   6.05   0.00   6.63      0    0.000     0    0.000    0.000
  0.00   0.00  26.15   0.00   8.07      0    0.000     0    0.000    0.000
  0.00   0.00  76.55   0.00  16.33      0    0.000     0    0.000    0.000
  0.00  78.52  19.82   0.10  28.19      3    0.023     0    0.000    0.023
 97.58   0.00  30.21   0.11  36.39      4    0.033     0    0.000    0.033
 97.58   0.00  34.18   0.11  36.46      4    0.033     0    0.000    0.033
  0.00  99.95  96.01   5.21  52.10      6    0.050     0    0.000    0.050
 95.45   0.00  25.96   5.22  57.08      6    0.065     0    0.000    0.065
 95.45   0.00  69.92   5.22  65.57      6    0.065     0    0.000    0.065
  0.00  99.93  37.95  10.91  77.75      7    0.098     0    0.000    0.098
  0.00  99.93  37.95  10.91  77.75      7    0.098     0    0.000    0.098
  0.00  99.93  37.95  10.91  77.75      7    0.098     0    0.000    0.098
  0.00  99.93  37.95  10.91  77.75      7    0.098     0    0.000    0.098

The P column represents the usage ratio of the permanent generation;

Execute GC to see if the permanent generation will become smaller: Execute jcmd $PID GC.run:

$ jstat -gcutil 21225 1000 1000
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT   
  0.00  99.93  41.48  10.91  77.75      7    0.098     0    0.000    0.098
  0.00  99.93  41.48  10.91  77.75      7    0.098     0    0.000    0.098
  0.00   0.00   0.00  10.62  77.68      8    0.116     1    0.209    0.325
  0.00   0.00   0.00  10.62  77.68      8    0.116     1    0.209    0.325

It can be seen that the permanent generation has hardly changed, and the permanent generation has not been recycled.

$ jmap -permstat 21225
Attaching to process ID 21225, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.65-b04
finding class loader instances ..done.
computing per loader stat ..done.
please wait.. computing liveness......................................................done.
class_loader	classes	bytes	parent_loader	alive?	type

<bootstrap>	1301	7691864	  null  	live	<internal>
0x0000000085247020	1	1888	  null  	dead	sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x000000008527f2b0	1744	12519760	0x0000000085031cb0	live	com/spiro/test/mr/MapReduceClassLoader@0x0000000082096e50
0x0000000085018b98	1757	12584416	0x0000000085031cb0	live	com/spiro/test/mr/MapReduceClassLoader@0x0000000082096e50
0x0000000085128c80	0	0	0x0000000085031cb0	live	java/util/ResourceBundle$RBClassLoader@0x00000000820f5030
0x0000000085021cc0	1	3032	  null  	dead	sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x00000000852a6f50	1	3056	  null  	dead	sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x0000000085031cb0	83	873544	0x0000000085031d00	live	sun/misc/Launcher$AppClassLoader@0x0000000082013318
0x0000000085021c80	1	1888	  null  	dead	sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x00000000852a6f90	1	3056	  null  	dead	sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x0000000085021c40	1	3056	0x0000000085018b98	dead	sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x00000000852a6fd0	1	3056	0x00000000852a67e0	dead	sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x0000000085031d00	0	0	  null  	live	sun/misc/Launcher$ExtClassLoader@0x0000000081fb5c08
0x0000000085021c00	1	3032	  null  	dead	sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x00000000852a6e48	1	3056	0x000000008527f2b0	dead	sun/reflect/DelegatingClassLoader@0x0000000081e4fc00
0x00000000852a67e0	1744	12519760	0x0000000085031cb0	live	com/spiro/test/mr/MapReduceClassLoader@0x0000000082096e50

total = 16	6638	46214464	    N/A    	alive=7, dead=9	    N/A  
$ jcmd 21225 GC.class_histogram | grep MapReduceClassLoader
 264:             3            240  com.spiro.test.mr.MapReduceClassLoader
num     #instances         #bytes  class name
$ jcmd 21225 GC.class_histogram | grep org.apache.hadoop.mapreduce.Job
 772:             2             48  org.apache.hadoop.mapreduce.Job$JobState
 785:             1             48  org.apache.hadoop.mapreduce.Job
 813:             1             48  org.apache.hadoop.mapreduce.Job
 878:             2             48  org.apache.hadoop.mapreduce.Job$JobState
 883:             1             48  org.apache.hadoop.mapreduce.Job
 961:             2             48  org.apache.hadoop.mapreduce.Job$JobState
1357:             1             24  [Lorg.apache.hadoop.mapreduce.Job$JobState;
1511:             1             24  [Lorg.apache.hadoop.mapreduce.Job$JobState;
1601:             1             24  [Lorg.apache.hadoop.mapreduce.Job$JobState;

It can be seen that there are 3 class loaders of the MapReduceClassLoader type, and they occupy most of the capacity. There are three org.apache.hadoop.mapreduce.Job objects. Although they have the same name, they are different classes and are loaded by three class loaders.

Analyze the reasons

From the code point of view, MapReduceClassLoader cl = new MapReduceClassLoader(); is defined in the method body. When the method ends, the local variable table in the stack frame disappears, the MapReduceClassLoader object should be GC, and all the loaded by it Classes should also be recycled. But why is it not recycled? According to the root search algorithm (GC Roots Tracing) that determines whether the object is alive or not, there must be the following GC roots that still hold the MapReduceClassLoader object:

  • Reference objects in the virtual machine stack (local variable table in the stack frame)
  • Static properties in the method area;
  • Constant references in the method area;
  • The reference object of JNI in the native method stack;

The following is an analysis of the dump file made by java. Execute export dump filejmap -dump:live,format=b,file=heap.bin $PID

Analyzed by the jvisualvm tool

jvisualvm

Enter image description

Find the MapReduceClassLoader class in the Classes tab, right-click and select "show in instances view",

Enter image description

Right-click on "this" in the following References and select "show nearest gc root",Enter image description

You can see that there is a thread object named "Thread-2" whose contextClassLoader attribute references to the MapReduceClassLoader object. As a result, the MapReduceClassLoader object cannot be recycled.

Enter image description

Enter image description

The thread information can be seen in the Summary tab page. One of the thread call stacks named "Thread-2" is in the org.apache.hadoop.net.unix.DomainSocketWatcher class. Through source code analysis, this thread is submitting MR during execution. The child thread started by the hadoop framework during the task process will use the contextClassLoad of the parent thread as its contextClassLoad when creating the child thread.

So far, the problem analysis is over.

Summarize

The cause of the problem is that an operation was performed before submitting the MR task Thread.currentThread().setContextClassLoader(cl);, and a resident child thread started by hadoop during the submission process uses the contextClassLoad of its parent thread as its context thread, that is, MapReduceClassLoader.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324933772&siteId=291194637