hadoop cluster commit code

In hadoop cluster, finished mapreduce did not complete the work, need to fight jar package, and then submitted to the cluster jar. hadoop provides access to the submission jar.

WordCount write hadoop mapreduce entry-level process, write wordcount, basically 80% of mapreduce understand.

mapreduce divided map process and reduce process, users can customize the map process and reduce process according to their own business.

To wordcount for example, to calculate the number of words that appear in the text, you need to read the text and statistics for words.

map process

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
package com.hadoop.mapreduce;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;



2018/1/14 the Created by Frankie ON *.
* The KEYIN: By default, the start offset is mr framework read a line of text, Long has its own sequence of streamlining the hadoop interface, it is not used directly long, but with LongWritable
* VALUEIN: by default, the content framework mr read a line of text, String
* KEYOUT: user-defined logic is processed output data key, here is the word, String
* VALUEOUT: vlaue is custom logic output data after treatment is completed, the number of times a word, Integer
*
*
** /

public class extends Mapper<LongWritable, Text, Text, IntWritable> {

/ *
* The Map stage of the business logic written on the Map Custom () method
* map task would be called once for each line of input data our custom map () method
* * / protected void the Map (LongWritable Key, Text value, context the context) throws IOException, InterruptedException { String = value.toString Line (); String [] = line.split words ( "" ); for (String Word: words) { // the word as a key, the number 1 as the value for distribution in subsequent data can be distributed according to the word, so that the same will be used the same word Task the reduce // the Map Task collects, written on a document context.write ( new new Text (Word), new new IntWritable ( 1 )); } }











}

reduce process

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
package com.hadoop.mapreduce;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.Iterator;

/ **
* the Created by Frankie ON 2018/1/14.
*
* The KEYIN, the corresponding mapper output VALUEIN KEYOUT, VALUEOUT type corresponds
* KEYOUT, VALUEOUT reduce custom logic processing result output data type
* KEYOUT a word,
* of VALUE is the total number
* /
public class WordCountReducer the extends the Reducer < the Text , IntWritable , the Text , IntWritable > {

@Override protected void the reduce (the Text key, the Iterable <IntWritable> values, Context context) throws IOException, InterruptedException { / * parameters into the key, a set key is on the same word kv Context context * * / int COUNT = 0 ;







// Iterator<IntWritable> iterator = values.iterator();
// while(iterator.hasNext()){
// count += iterator.next().get();
// }
//

for( IntWritable value: values){
count += value.get();
}
context.write(key, new IntWritable(count));

}
}

mapreduce过程存在一些问题,比如,

Map task如何进行任务分配?

Reduce task如何进行任务分配?

Map task与 reduce task如何进行衔接?

如果某map task 运行失败,如何处理?

Map task如果都要自己负责数据的分区,很麻烦

为例解决这些问题,需要有个master专门对map reduce进行管理。

在WordCount文档中,有专门对作业进行配置,以及最后将代码提交到客户端。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
package com.hadoop.mapreduce;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
* Created by Frankie on 2018/1/14.
*
* 相当于yarn集群的客户端
* 需要在此封装mr进程的相关运行参数,指定jar包,最后提交给yarn
*/

public class WordCount {

public static void main(String[] args) throws Exception{
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "wordcount");

// 指定本进程的jar包所在的本地路径
job.setJarByClass(WordCount.class);

// 指定本业务使用的map业务类
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);

// 指定mapper输出数据的kv类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

//指定最终输出的数据的kv类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

//指定job的输入原始文档所在目录
// /data/adult.data
FileInputFormat.setInputPaths(job, new Path(args[1]));

// 指定job的输出结果所在目录
FileOutputFormat.setOutputPath(job, new Path(args[2]));

// // 将job中配置的相关参数,以及job所用的java类所在的Jar包,提交给yarn去运行
// job.submit();

// 提交job配置,一直等待到运行结束
boolean res = job.waitForCompletion(true);
System.exit(res? 0: 1);
}
}

代码编辑完成后,对代码进行打包。我们在这里选择不依赖第三方包的打包方式进行打包。

打完包后,将生成的jar包提交到服务器中去。
并执行,

1
leiline@master:~/Documents/hadoop/myJars$ hadoop jar HadoopMapReduce.jar com.hadoop.mapreduce.WordCount /data/adult /data/out

Note, out document is automatically created by the process does not require the user to manually create. Finally, after the code is completed, you can see the results in the implementation of the hdfs:

1
2
3
Found 2 items
-rw-r--r-- 3 leiline supergroup 0 2018-01-14 19:01 /data/out/_SUCCESS
-rw-r--r-- 3 leiline supergroup 216737 2018-01-14 19:01 /data/out/part-r-00000

Original: Big Box  hadoop cluster commit code


Guess you like

Origin www.cnblogs.com/petewell/p/11444787.html