MapReduce principle analysis

Creative Commons License Copyright: Attribution, allow others to create paper-based, and must distribute paper (based on the original license agreement with the same license Creative Commons )

HDFS understand

(底层存储)分布式存储,将庞大的任务进行分散到各个节点上,每个节点负责一小部分,处理起来更加方便。
HDFS底层依赖-----方便计算衍生出各种计算的东西-----到大数据发展到现在的技术的生态圈。

MapReduce

MapReduce是hadoop的核心组件之一,hadoop要实现分布式需要包括两部分,一部分是分布式文件系统hdfs,一部分是分布式计算框架mapreduce

Principle explain:
In this small inflammation cited the example about the struggle, easy to understand

I asked his wife: "? Do you really want to understand what is MapReduce" She was firmly replied, "Yes." I:
How did you prepare onion chili sauce? (The following is not accurate recipes, do not try this at home) wife:
I will take an onion, chop it, then stir in the salt and water, and finally into the grinding mill mixed in. This will get the onion chili sauce. But MapReduce and what relationship? I:
you wait. Let me compile a complete plot, so you can certainly understand MapReduce within 15 minutes. Wife: Okay.
I: Now, suppose you want to use mint, onion, tomato, pepper, garlic get a bottle of chili sauce mixed. How would you do it? Wife:
I'll take a pinch of mint leaves, onion, a tomato, a pepper, a garlic, chopped, add the right amount of salt and water, then add mixed in grinding mill, so you can get a bottle chili sauce mixed up.
I: Yes, let's MapReduce concept applied to the recipe. Map and Reduce actually two operations, I come to you to explain in detail below. Map (mapping)
the onion, tomatoes, peppers and chopped garlic, each role is a Map operations on those objects. So you give an onion Map, Map will be chopped onions.
Similarly, you put pepper, garlic and tomatoes one by one to show it Map, you will get a variety of pieces. So, when you cut vegetables like onions, you execute is a Map operation.
Map action applied to every vegetable, it will produce a correspondingly more or fragments, produced in our example is the vegetable pieces. Onions may appear to have a broken situation in the Map operation, you just lost the bad onions on the line. So, if there was a bad onion, Map operation will filter out bad onions without the onion produce any bad blocks. Reduce (simplified) this phase, you will have a variety of vegetables chopped into grinding mill where you can get a bottle of chili sauce. This means that a bottle of chili sauce to be made, you need to grind all the ingredients. Therefore, the mill will usually vegetables chopped map operation gathered together.

Wife: So that's what I MapReduce:? You can say yes, you can also say it is not.
In fact, this is only part of MapReduce, MapReduce is a powerful distributed computing. Wife: Distributed Computing? what is that? Please explain it to me. I:
Suppose you took part in a chili sauce recipe contest and you win the best chili sauce award. After winning, popular chili sauce recipes, so you want to start selling homemade brand of chili sauce. Suppose you need to produce 10,000 bottles of chili sauce every day, how would you do it?
Wife: I will find a large number of suppliers to provide raw materials for me. I: Yes, it is like that. Then you can complete alone make it? That, alone, would have chopped raw material?
Just a grinder and whether they can meet the needs? And now, we also need to supply different types of chili sauce, hot pepper sauce like onions, green pepper and hot pepper sauce, tomato sauce, chili and more. Wife:
Of course not, I will hire more workers to cut vegetables. I need more of a grinder, so I can quickly produce a chili sauce.
I: Yes, so now you have to assign work, you will need to cut a few people together vegetables. Each person must deal with a bag full of vegetables, and everyone is equivalent to performing a simple Map action. Everyone will continue to come out of the bag of vegetables, and a time for a vegetable processing, that is, they chopped until the bag is empty so far. Thus, when all the workers have finished cutting, the table will have a piece of onion, tomato pieces, garlic and so on.
Wife: But how can I create a different kind of tomato sauce it?
I: Now you will see MapReduce missing phase - stirring. MapReduce chopped vegetables all outputs are stirred together, these vegetables are crushed in key generated in operation based map. Agitation will be done automatically, you can assume that the key is the name of one raw material, like onions.
Therefore, all keys are stirred together onion, onion and transferred to the grinder in the grinding. In this way, you can get the onion chili sauce. Similarly, all of the tomato will be transferred to the grinder marked in tomato, peppers and tomato sauce produced.

Image interpretation
Here Insert Picture Description
The first step: data starting up, the program will calculate the distribution
Step two: first ahead of the data to remove dirty cleaning ------
The third step: to conduct a small polymeric advance
Step 4: Small after the data communication with the remaining data transport polymerization (shuffle) ----- most affect the efficiency, waste of time
step 5: in a large combined Reduce, i.e. the last merge
Briefly:
1. data slice
2. counted
2.1 process Looking - disk
2.1 calculation process - memory
2.3 results - disk
Map
corresponds to an ordered transmission data processing
Suffle
equivalent duct, the transmission data
Reduce
corresponds plant, the amounts of data into an ordered or the most direct finished
code is interpreted
1,1, project New project -MapReduce
2, the lib delete, change into the jar package
3. there is a permissions problem importing my nativeio package to eliminate conflicts of competence src ----- *********** focus
4, start coding
wordcount
4.1 to create a map and reduce class ----- have a direct counterpart to create
4.2 Creating the implementation class -Job
(1) the map

  • mapreduce program map data reading end time, each read line of the default long string string int
  • In any original mapreduce java type can not be used in the data transmission, which requires the use of package type LongWritable
  • context function is connected to the connection point map and reduce terminal-side program, the output data written to disk context requires channel

public class Mapper extends org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable ikey, Text value, Context context) throws IOException, InterruptedException {
String[] st = value.toString().split(" ");//文件切割
for(int i=0;i<st.length;i++){
context.write(word, one);//进行文件的传输
}
}
}
(2)Reduce
public class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
// process values
int sum=0;
for (IntWritable i:values) {
sum+=i.get();
}
context.write(key, new IntWritable(sum));
}
}
(3)Job

public class Job1 {
public static void main(String[] args) throws Exception {
System.setProperty(“HADOOP_USER_NAME”, “root”);
Configuration conf=new Configuration();
conf.set(“fs.defaultFS”, “hdfs://192.168.110.131:9000”);
Job job = Job.getInstance(conf);
job.setJobName(“wc2Job”);
job.setJarByClass(Job1.class);
job.setMapperClass(Mapper.class);
job.setReducerClass(Reduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
//设置需要计算的输入文件路径
FileInputFormat.setInputPaths(job, new Path(“hdfs://192.168.110.131:9000/user/root/abc.txt”));
FileOutputFormat.setOutputPath(job, new Path(“hdfs://192.168.110.131:9000/user/root1”));
//System.exit(job.waitForCompletion(true) ? 0 : 1);
boolean zt=job.waitForCompletion(true);
if(zt){
System.out.println(“执行成功”);
}
}
}

Job 1, the profile
2, the method writes the class
3, the input and output paths
4, boolean zt = job.waitForCompletion (true ); determining whether to execute

Guess you like

Origin blog.csdn.net/power_k/article/details/92006410