The first run MapReduce programs

Based on the build environment CDH5 of https://blog.csdn.net/songzehao/article/details/91044032

CDH5 big data environment has been set up successfully, comes with example has run through, you can not run old people's mapreduce program it, so it's time to write your own Running a mr.

How to write a program first matter, let the spirit ism, or a little nice to say learning technology in order to barbarians, just the first step in the deployment process to successfully finish the run jar package, as to how to achieve mr, to stay behind go study step. So, first with jdgui decompile hadoop own example jar (/opt/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/jars/hadoop-examples.jar), find WordCount.class copy the Java source code into a plain their new Java Project, we found a lot of code error, because small jar package, which package hadoop jar programming needs, as follows:

  • hadoop-client-3.2.0.jar
  • hadoop-common-3.2.0.jar
  • hadoop-hdfs-3.2.0.jar
  • hadoop-mapreduce-client-core-3.2.0.jar
  • commons-cli-1.2.jar

Given by the respective dependent pom: 

<!-- hadoop mapreduce编程所需jars -->
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-common</artifactId>
	<version>3.2.0</version>
</dependency>
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-hdfs</artifactId>
	<version>3.2.0</version>
</dependency>
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-mapreduce-client-core</artifactId>
	<version>3.2.0</version>
</dependency>
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-client</artifactId>
	<version>3.2.0</version>
</dependency>
<dependency>
	<groupId>commons-cli</groupId>
	<artifactId>commons-cli</artifactId>
	<version>1.2</version>
</dependency>

After introduction jar package, no code is given, the next generation prepared jar package, there are a variety of ways, including using project Maven command mvn package like.

A first simple manner, as in Eclipse Export JAR FILE directly;

Another way to use the javac command to compile and use the jar command packaging, as follows:

E:\J2EE_workspace\Test\src>javac -cp .;D:\maven-repository\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;D:\maven-repository\org\apache\hadoop\hadoop-client\3.2.0\hadoop-client-3.2.0.jar;D:\maven-repository\org\apache\hadoop\hadoop-common\3.2.0\hadoop-common-3.2.0.jar;D:\maven-repository\org\apache\hadoop\hadoop-hdfs-client\3.2.0\hadoop-hdfs-client-3.2.0.jar;D:\maven-repository\org\apache\hadoop\hadoop-mapreduce-client-core\3.2.0\hadoop-mapreduce-client-core-3.2.0.jar WordCount.java
E:\J2EE_workspace\Test\src>jar -cvf wc3.jar *.class
已添加清单
正在添加: WordCount$IntSumReducer.class(输入 = 1739) (输出 = 739)(压缩了 57%)
正在添加: WordCount$TokenizerMapper.class(输入 = 1736) (输出 = 754)(压缩了 56%)
正在添加: WordCount.class(输入 = 3037) (输出 = 1619)(压缩了 46%)

The resulting wc3.jar from local upload to any cluster node, taking into account the primary node (10.1.4.18/b3/bi-zhaopeng03) from the service too much lead OOM, so here put it into a agent from the node (10.1 .4.19 / b4 directory on / bi-zhaopeng04) / root / songzehao / data:

[root@bi-zhaopeng04 data]# pwd
/root/songzehao/data
[root@bi-zhaopeng04 data]# ll
总用量 16
-rwxrwxrwx 1 root root   90 6月  11 15:46 test.sh
-rwxrwxrwx 1 hdfs hdfs 4261 6月  11 18:26 wc3.jar
-rw-r--r-- 1 root root   61 6月  11 10:29 words.txt

Next step is to create a text file words.txt as wordcount program and then uploaded to a file on hdfs / tmp / songzehao / words_input.

words.txt reads as follows:

[root@bi-zhaopeng04 data]# cat /root/songzehao/data/words.txt 
szh dk tyn cj cj zp zp szh dk szh dk dk dk tyn

Hdfs uploaded to the specified file / tmp / songzehao / words_input in:

[root@bi-zhaopeng04 data]# pwd
/root/songzehao/data
[root@bi-zhaopeng04 data]# sudo -u hdfs hadoop fs -put words.txt /tmp/songzehao/words_input
put: `words.txt': No such file or directory

The results error put: `words.txt ': No such file or directory, pay attention here can not sudo -u hdfs, or will be error says no such file. Strange, obviously have this file, the path is also no problem, what causes it? This is because the Linux system user rights mechanisms cause, that is to say hdfs user does not have permission to operate /root/songzehao/data/words.txt or do not have permission to open all the parent directory of this file, so encounter Linux environment file does not exist or can not find the problem, the investigation should focus on whether the user has sufficient rights. Might be a test, we switch to the hdfs users try to access the / root directory, you can find hdfs do not have permission to access / root:

[root@bi-zhaopeng04 data]# su hdfs
上一次登录:三 6月 12 09:55:44 CST 2019pts/1 上
[root@bi-zhaopeng04 data]$ ll /root
ls: 无法打开目录/root: 权限不够

So, we can operate this file to a Linux user to perform hadoop put to, you can see what is also consistent with words.txt:

[root@bi-zhaopeng04 data]# hadoop fs -put words.txt /tmp/songzehao/words_input
[root@bi-zhaopeng04 data]# hadoop fs -cat /tmp/songzehao/words_input
szh dk tyn cj cj zp zp szh dk szh dk dk dk tyn

At this point, we have all the conditions for implementation of a theoretical mr task, then come try it, build with CDH5 https://blog.csdn.net/songzehao/article/details/91044032 test command ( sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100) , we try to execute the following command:

[root@bi-zhaopeng04 data]# sudo -u hdfs hadoop jar wc3.jar WordCount /tmp/songzehao/words_input /tmp/songzehao/words_output
或者使用绝对路径
[root@bi-zhaopeng04 data]# sudo -u hdfs hadoop jar /root/songzehao/data/wc3.jar WordCount  /tmp/songzehao/words_input /tmp/songzehao/words_output

Try to execute, the results being given file does not exist can not find /root/songzehao/data/wc3.jar File :

[root@bi-zhaopeng04 data]# sudo -u hdfs hadoop jar wc3.jar WordCount /tmp/songzehao/words_input /tmp/songzehao/words_output                     
19/06/12 14:13:48 INFO client.RMProxy: Connecting to ResourceManager at bi-zhaopeng03/10.1.4.18:8032
19/06/12 14:13:48 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/hdfs/.staging/job_1559721784881_0043
19/06/12 14:13:48 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.FileNotFoundException: File /root/songzehao/data/wc3.jar does not exist
Exception in thread "main" java.io.FileNotFoundException: File /root/songzehao/data/wc3.jar does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:590)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:803)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:580)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:425)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340)
        at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1949)
        at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1917)
        at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1882)
        at org.apache.hadoop.mapreduce.JobResourceUploader.copyJar(JobResourceUploader.java:210)
        at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:166)
        at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
        at WordCount.main(WordCount.java:49)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Yes, with sudo -u hdfs put, like, first consider whether hdfs user has access to the task jar /root/songzehao/data/wc3.jar, obviously it is not, but the correct specification of tasks hadoop jar package is to in hdfs user to resolve this contradiction, it is recommended to put wc3.jar hdfs users have access to the directory, as follows /home/hdfs/wc3.jar or / opt / ** and other public directory can be, that's why after you build CDH5, to test the implementation of sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100 may be reasons for the success (because / opt / cloudera / parcels / CDH / lib / hadoop-mapreduce / hadoop-mapreduce-examples.jar already in / opt this system comes under the public directory, hdfs user has access to). Jar package placed properly, such as to put me under / opt / szh:

[root@bi-zhaopeng04 data]# cd /opt/szh/
[root@bi-zhaopeng04 szh]# ll
总用量 8
-rw-r--r-- 1 root root 4910 6月  12 11:42 wc3.jar

Thereafter, the switch may choose not to perform hdfs user directly, as follows:

[root@bi-zhaopeng04 data]# sudo -u hdfs hadoop jar /opt/szh/wc3.jar WordCount /tmp/songzehao/words_input /tmp/songzehao/words_output

Hdfs user or switch to be performed as follows:

[root@bi-zhaopeng04 data]# su hdfs
[hdfs@bi-zhaopeng04 data]$ hadoop jar /opt/szh/wc3.jar WordCount /tmp/songzehao/words_input /tmp/songzehao/words_output

Continue to try to carry out, under which the total no problem, right? Well, it is given a class can not find WordCount $ TokenizerMapper not found Class , look at the specific exception stack it:

[root@bi-zhaopeng04 data]# sudo -u hdfs hadoop jar /opt/szh/wc3.jar WordCount /tmp/songzehao/words_input /tmp/songzehao/words_output
19/06/10 19:39:57 INFO client.RMProxy: Connecting to ResourceManager at bi-zhaopeng03/10.1.4.18:8032
19/06/10 19:39:57 WARN mapreduce.JobResourceUploader: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
19/06/10 19:39:57 INFO input.FileInputFormat: Total input paths to process : 1
19/06/10 19:39:58 INFO mapreduce.JobSubmitter: number of splits:1
19/06/10 19:39:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1559721784881_0003
19/06/10 19:39:58 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
19/06/10 19:39:58 INFO impl.YarnClientImpl: Submitted application application_1559721784881_0003
19/06/10 19:39:58 INFO mapreduce.Job: The url to track the job: http://bi-zhaopeng03:8088/proxy/application_1559721784881_0003/
19/06/10 19:39:58 INFO mapreduce.Job: Running job: job_1559721784881_0003
19/06/10 19:40:02 INFO mapreduce.Job: Job job_1559721784881_0003 running in uber mode : false
19/06/10 19:40:02 INFO mapreduce.Job:  map 0% reduce 0%
19/06/10 19:40:05 INFO mapreduce.Job: Task Id : attempt_1559721784881_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class WordCount$TokenizerMapper not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2199)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:196)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class WordCount$TokenizerMapper not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
        ... 8 more

The problem is relatively straightforward, there is no set job jar package, No job jar file set, you need to set mapred.jar to conf, ie, Java code, Riga such a sentence:

// wc3.jar为最后要执行的作业jar包名称
conf.set("mapred.jar", System.getProperty("user.dir") + File.separator + "wc3.jar");

All the way here to give WordCount source, including its own to modify some of the standard output statement added, as follows:

import java.io.File;
import java.io.IOException;
import java.util.Arrays;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {
	@SuppressWarnings("deprecation")
	public static void main(String[] args) throws Exception {
		System.out.println("Args: " + Arrays.toString(args));
		Configuration conf = new Configuration();
		conf.set("mapred.jar", System.getProperty("user.dir") + File.separator + "wc3.jar");
		// mapred.jar已过期,也可使用新支持的mapreduce.job.jar
		// conf.set("mapreduce.job.jar", System.getProperty("user.dir") + File.separator + "wc3.jar");
		System.out.println("当前工作目录: " + System.getProperty("user.dir"));
		System.out.println("mapred.jar/mapreduce.job.jar: " + conf.get("mapred.jar"));
		File jarFile = new File(conf.get("mapred.jar"));
		System.out
				.println("Jar to run: " + jarFile.getAbsolutePath() + "," + jarFile.isFile() + "," + jarFile.exists());
		String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
		System.out.println("OtherArgs: " + Arrays.toString(otherArgs));
		if (otherArgs.length < 2) {
			System.err.println("Usage: wordcount <in> [<in>...] <out>");
			System.exit(2);
		}
		Job job = new Job(conf, "szh's word count");
		job.setJarByClass(WordCount.class);
		job.setMapperClass(TokenizerMapper.class);
		job.setCombinerClass(IntSumReducer.class);
		job.setReducerClass(IntSumReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		for (int i = 0; i < otherArgs.length - 1; i++) {
			FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
			System.out.println("Input path: " + otherArgs[i]);
		}
		FileOutputFormat.setOutputPath(job, new Path(otherArgs[(otherArgs.length - 1)]));
		System.out.println("Output path: " + otherArgs[(otherArgs.length - 1)]);

		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}

	public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
		private IntWritable result = new IntWritable();

		public void reduce(Text key, Iterable<IntWritable> values,
				Reducer<Text, IntWritable, Text, IntWritable>.Context context)
				throws IOException, InterruptedException {
			int sum = 0;
			for (IntWritable val : values) {
				sum += val.get();
			}
			this.result.set(sum);
			context.write(key, this.result);
		}
	}

	public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
		private static final IntWritable one = new IntWritable(1);
		private Text word = new Text();

		public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context)
				throws IOException, InterruptedException {
			StringTokenizer itr = new StringTokenizer(value.toString());
			while (itr.hasMoreTokens()) {
				this.word.set(itr.nextToken());
				context.write(this.word, one);
			}
		}
	}
}

Toss so long, which under the re-export to jar package, re-execute the total SUCCESS it, and finally succeeded, as follows:

[root@bi-zhaopeng04 ~]# sudo -u hdfs hadoop jar /opt/szh/wc3.jar WordCount /tmp/songzehao/words_input /tmp/songzehao/words_output
Args: [/tmp/songzehao/words_input, /tmp/songzehao/words_output]
当前工作目录: /tmp/hsperfdata_hdfs
mapred.jar/mapreduce.job.jar: /tmp/hsperfdata_hdfs/wc3.jar
Jar to run: /tmp/hsperfdata_hdfs/wc3.jar,false,false
OtherArgs: [/tmp/songzehao/words_input, /tmp/songzehao/words_output]
Input path: /tmp/songzehao/words_input
Output path: /tmp/songzehao/words_output
19/06/12 14:57:38 INFO client.RMProxy: Connecting to ResourceManager at bi-zhaopeng03/10.1.4.18:8032
19/06/12 14:57:38 INFO input.FileInputFormat: Total input paths to process : 1
19/06/12 14:57:38 INFO mapreduce.JobSubmitter: number of splits:1
19/06/12 14:57:38 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/06/12 14:57:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1559721784881_0045
19/06/12 14:57:38 INFO impl.YarnClientImpl: Submitted application application_1559721784881_0045
19/06/12 14:57:38 INFO mapreduce.Job: The url to track the job: http://bi-zhaopeng03:8088/proxy/application_1559721784881_0045/
19/06/12 14:57:38 INFO mapreduce.Job: Running job: job_1559721784881_0045
19/06/12 14:57:43 INFO mapreduce.Job: Job job_1559721784881_0045 running in uber mode : false
19/06/12 14:57:43 INFO mapreduce.Job:  map 0% reduce 0%
19/06/12 14:57:47 INFO mapreduce.Job:  map 100% reduce 0%
19/06/12 14:57:51 INFO mapreduce.Job:  map 100% reduce 13%
19/06/12 14:57:55 INFO mapreduce.Job:  map 100% reduce 25%
19/06/12 14:57:59 INFO mapreduce.Job:  map 100% reduce 38%
19/06/12 14:58:03 INFO mapreduce.Job:  map 100% reduce 50%
19/06/12 14:58:07 INFO mapreduce.Job:  map 100% reduce 63%
19/06/12 14:58:11 INFO mapreduce.Job:  map 100% reduce 75%
19/06/12 14:58:15 INFO mapreduce.Job:  map 100% reduce 88%
19/06/12 14:58:19 INFO mapreduce.Job:  map 100% reduce 100%
19/06/12 14:58:19 INFO mapreduce.Job: Job job_1559721784881_0045 completed successfully
19/06/12 14:58:19 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=367
                FILE: Number of bytes written=2018974
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=163
                HDFS: Number of bytes written=27
                HDFS: Number of read operations=51
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=32
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=16
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=1703
                Total time spent by all reduces in occupied slots (ms)=28315
                Total time spent by all map tasks (ms)=1703
                Total time spent by all reduce tasks (ms)=28315
                Total vcore-seconds taken by all map tasks=1703
                Total vcore-seconds taken by all reduce tasks=28315
                Total megabyte-seconds taken by all map tasks=1743872
                Total megabyte-seconds taken by all reduce tasks=28994560
        Map-Reduce Framework
                Map input records=1
                Map output records=14
                Map output bytes=103
                Map output materialized bytes=303
                Input split bytes=116
                Combine input records=14
                Combine output records=5
                Reduce input groups=5
                Reduce shuffle bytes=303
                Reduce input records=5
                Reduce output records=5
                Spilled Records=10
                Shuffled Maps =16
                Failed Shuffles=0
                Merged Map outputs=16
                GC time elapsed (ms)=895
                CPU time spent (ms)=12940
                Physical memory (bytes) snapshot=3937447936
                Virtual memory (bytes) snapshot=48442486784
                Total committed heap usage (bytes)=3520069632
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=47
        File Output Format Counters 
                Bytes Written=27

Hdfs hot go to see there is no output generated file in the specified directory / tmp / songzehao / words_output, can browse through a web interface hdfs directory, address: masterIp: 50070 /explorer.html#/, as follows:

Look at the contents of the output file it, figured out the correct number of words, one approach is to directly click on the file size can reduce non-zero output task file part-r-xxxxx, then Download view; another feasible by hadoop command View correspondence, such as:

[root@bi-zhaopeng04 ~]# hadoop fs -cat /tmp/songzehao/words_output/part-r-00000 /tmp/songzehao/words_output/part-r-00007 /tmp/songzehao/words_output/part-r-00008
szh     3
zp      2
cj      2
dk      5
tyn     2

At this point, even if we successfully run the first task of mr. But each time the hadoop jar should specify the main class name? In this way, some small trouble, jar package inside the META-INF / MANIFEST.MF can not save the main types of information do? Hadoop distributed system architecture so good, it should also have achieved such a small demand for it, if you can, that each execution hadoop jar will not have to manually specify the main class. In fact, Hadoop really had thought and realized that, if we deliberately do not specify the name of the main class, what happens? as follows:

[root@bi-zhaopeng04 ~]# sudo -u hdfs hadoop jar /opt/szh/wc3.jar /tmp/songzehao/words_input /tmp/songzehao/words_output          
Exception in thread "main" java.lang.ClassNotFoundException: /tmp/songzehao/words_input
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Rightly being given that kind can not be found, at least from the exception stack information, we have to look at the source code to find a breakthrough RunJar.java, visible Hadoop to run the job jar package, will enter this import category. Then take a look at its main method and the run method:

/** Run a Hadoop job jar.  If the main class is not in the jar's manifest,
   * then it must be provided on the command line. */
  public static void main(String[] args) throws Throwable {
    new RunJar().run(args);
  }

  public void run(String[] args) throws Throwable {
    String usage = "RunJar jarFile [mainClass] args...";

    if (args.length < 1) {
      System.err.println(usage);
      System.exit(-1);
    }

    int firstArg = 0;
    String fileName = args[firstArg++];
    File file = new File(fileName);
    if (!file.exists() || !file.isFile()) {
      System.err.println("JAR does not exist or is not a normal file: " +
          file.getCanonicalPath());
      System.exit(-1);
    }
    String mainClassName = null;

    JarFile jarFile;
    try {
      jarFile = new JarFile(fileName);
    } catch (IOException io) {
      throw new IOException("Error opening job jar: " + fileName)
        .initCause(io);
    }

    Manifest manifest = jarFile.getManifest();
    if (manifest != null) {
      mainClassName = manifest.getMainAttributes().getValue("Main-Class");
    }
    jarFile.close();

    if (mainClassName == null) {
      if (args.length < 2) {
        System.err.println(usage);
        System.exit(-1);
      }
      mainClassName = args[firstArg++];
    }
    mainClassName = mainClassName.replaceAll("/", ".");

    File tmpDir = new File(System.getProperty("java.io.tmpdir"));
    ensureDirectory(tmpDir);

    final File workDir;
    try {
      workDir = File.createTempFile("hadoop-unjar", "", tmpDir);
    } catch (IOException ioe) {
      // If user has insufficient perms to write to tmpDir, default
      // "Permission denied" message doesn't specify a filename.
      System.err.println("Error creating temp dir in java.io.tmpdir "
                         + tmpDir + " due to " + ioe.getMessage());
      System.exit(-1);
      return;
    }

    if (!workDir.delete()) {
      System.err.println("Delete failed for " + workDir);
      System.exit(-1);
    }
    ensureDirectory(workDir);

    ShutdownHookManager.get().addShutdownHook(
        new Runnable() {
          @Override
          public void run() {
            FileUtil.fullyDelete(workDir);
          }
        }, SHUTDOWN_HOOK_PRIORITY);

    if (!skipUnjar()) {
      unJar(file, workDir);
    }

    ClassLoader loader = createClassLoader(file, workDir);

    Thread.currentThread().setContextClassLoader(loader);
    Class<?> mainClass = Class.forName(mainClassName, true, loader);
    Method main = mainClass.getMethod("main", String[].class);
    List<String> newArgsSubList = Arrays.asList(args)
        .subList(firstArg, args.length);
    String[] newArgs = newArgsSubList
        .toArray(new String[newArgsSubList.size()]);
    try {
      main.invoke(null, new Object[] {newArgs});
    } catch (InvocationTargetException e) {
      throw e.getTargetException();
    }
  }

From the main method of implementation notes or run method can be learned, "running Hadoop jobs jar. If the main class is not in the list of Manifest jar, then it must provide in the command line." Therefore, we need to export jar of when specifying the main categories main-class can pay attention to this step in Eclipse:

The new jar after export, view the META-INF / MANIFEST.MF as follows:

Manifest-Version: 1.0
Main-Class: WordCount

From now on, we will not have to perform hadoop jar specify the main class name in the command line manually.


end Sahua

发布了62 篇原创文章 · 获赞 22 · 访问量 7万+

Guess you like

Origin blog.csdn.net/songzehao/article/details/91560692