Getting Started Learning Series [Hadoop MapReduce 2.0] Five programming combat

Reprinted: https://blog.csdn.net/shengmingqijiquan/article/details/52916664

.MapReduce2.0 a programming model
on MR programming model please refer Previous: Getting Started Learning Series Hadoop [four] MapReduce 2.0 application scenarios and principles, the basic architecture and programming model

Two .MapReduce2.0 programming interface
three kinds programmatically
Java (the most primitive way)
the Java programming interface composition;
the old API where java package: org.apache.hadoop.mapred
new API where java package: org.apache.hadoop.mapreduce

The new API has better scalability;
two programming interfaces only exposed to the user in the form it is different, the internal execution engine is the same;
the old API is fully compatible with Hadoop 2.0, but the new API will not do

Hadoop 1.0.0 From the start, all the old and new release contains two types of API;

Hadoop Streaming (support multiple languages)
and Linux pipeline mechanism consistent
 achieve inter-process communication via standard input and output
 standard input and output of any language has
 several examples:
CAT 1.txt | grep "dong" | the Sort
CAT 1. txt | python grep.py | java sort.jar

Hadoop Pipes (supports C / C ++)
NOTE: Java programming interface is the foundation for all programming mode; only different programming interfaces exposed to the user in the form is different, the internal execution engine is the same; in different ways in different programming efficiency.

Three. MapReduce2.0 programming steps and demonstrate to WordCount java version [for example]
1.WordCount problem


2.Map stage


3.Reduce stage


4.mapper Design and Implementation


5.reducer Design and Implementation


Design and Implementation 6.main function


7. run
8. finishing problems
[1] parse input data format
default the TextInputFormat
 Each Map Task processing a split;
 a size equal to the split may be a plurality of block [];
 if the last line of data is truncated , after reading the first half of a block;
 converted into the key / value pairs, key is the offset, value is a row of content.


[2] a data stream


[3] The input data analyzing
public interface the InputFormat <K, V> {
InputSplit [] getSplits (the JobConf Job, int numSplits) throws IOException;
the RecordReader <K, V> getRecordReader (InputSplit Split,
the JobConf Job,
Reporter Reporter) throws IOException;
}
. 1
2
. 3
4
. 5
. 6
default TextInputFormat, for text files;
user can set parameters mapred.input.format.class InpuFormat achieved by
[4] Mapper-map processing logic
public interface Mapper <K1, V1, K2, V2> extends JobConfigurable, of Closeable {
void Map (Kl Key, Vl value, OutputCollector <K2, V2> Output, Reporter Reporter)
throws IOException;
}
. 1
2
. 3
. 4
new API is located in org.apache.hadoop.mapreduce.Mapper
【5】Partitioner—map输出结构分片
org.apache.hadoop.mapred(旧API):
public interface Partitioner<K2, V2> extends JobConfigurable {
int getPartition(K2 key, V2 value, int numPartitions);
}
org.apache.hadoop.mapreduce(新API):
public abstract class Partitioner<KEY, VALUE> {
public abstract int getPartition(KEY key, VALUE value, int numPartitions);
}


6.Reducer—reduce处理逻辑
org.apache.hadoop.mapred(旧API):
public interface Reducer<K2, V2, K3, V3> extends JobConfigurable, Closeable {
void reduce(K2 key, Iterator<V2> values,
OutputCollector<K3, V3> output, Reporter reporter)
throws IOException;
}
新API位于org.apache.hadoop.mapreduce.Reducer中
7.小结


IV. Summarize
this part to the classic example wordcount to detail the programming process and thought MR, very reference value of learning, he must knock several times more than their own code, which the master code execution principle, not after the study hadoop source code in future.
----------------
Disclaimer: This article is CSDN blogger "Data Circle" of the original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source and this link statement.
Original link: https: //blog.csdn.net/shengmingqijiquan/article/details/52916664

Guess you like

Origin www.cnblogs.com/ceshi2016/p/12118987.html