Comic: What is MapReduce?

Comic: What is MapReduce?

Click on the "Programmer Xiaohui" above, and select the "Top Official Account" to
deliver interesting and meaningful articles as soon as possible!
Comic: What is MapReduce?

Comic: What is MapReduce?

----- the next day-----

Comic: What is MapReduce?
Comic: What is MapReduce?
Comic: What is MapReduce?

Comic: What is MapReduce?
Comic: What is MapReduce?

Comic: What is MapReduce?

————————————
Comic: What is MapReduce?
Comic: What is MapReduce?

Comic: What is MapReduce?
Comic: What is MapReduce?

What is MapReduce?

MapReduce is a programming model whose theory comes from one of the three papers (MapReduce, BigTable, GFS) published by Google, and is mainly used for parallel computing of massive data.

MapReduce can be divided into two parts: Map and Reduce.
1. Map: mapping process, a set of data is mapped into new data according to a certain Map function.
2.Reduce: reduction process, which summarizes and outputs several sets of mapping results.
Comic: What is MapReduce?

Let us look at a practical application of chestnuts, how to efficiently count the number of people with all surnames in the country?

We can use the idea of ​​MapReduce to do parallel mapping for the population of each province, count several partial results, and then sort and summarize these partial results:
Comic: What is MapReduce?

What does this picture mean? Let's explain the steps separately:

1. Map:
Taking each province as a unit, multiple threads read the population data of different provinces in parallel, and each record generates a Key-Value key-value pair. The figure is only simplified data.

2.
The concept of Shuffle Shuffle is not mentioned in the previous article. Its Chinese means "shuffle". The process of shuffle is to sort, group, and copy data mapping.

3.Reduce
executes the results of the previous grouping, and summarizes and outputs.

It should be noted that the Shuffle described here is only an abstract concept. In the actual execution process, Shuffle is divided into two parts, one part is completed in the Map task, and the other part is completed in the Reduce task.

Comic: What is MapReduce?
Comic: What is MapReduce?

How does Hadoop implement MapReduce?

Comic: What is MapReduce?

Hadoop is a distributed system framework developed by the Apache Foundation. It contains multiple components, the core of which is HDFS and MapReduce.

Due to space reasons, the text will not give a complete introduction to Hadoop, but simply introduce how to implement MapReduce in the Haddoop framework.

The following picture is the whole process of Hadoop framework executing a MapReduce Job:

Comic: What is MapReduce?

Several entities need to be explained here:

HDFS:
Hadoop's distributed file system provides data sources and job information storage for MapReduce.

Client Node:
The process of executing the MapReduce program to submit a MapReduce Job.

JobTracker Node:
Split the complete Job into several Tasks, responsible for scheduling and coordinating all Tasks, equivalent to the role of Master.

TaskTracker Node:
Responsible for executing the Task assigned by JobTracker, equivalent to the role of Worker. The Tasks are divided into MapTask and ReduceTask.
Comic: What is MapReduce?
Comic: What is MapReduce?
Comic: What is MapReduce?
Comic: What is MapReduce?

Which one is better for big data technology training?

Due to the popularity of big data technology, related technical training schools have sprung up like bamboo shoots after a rain, but many training schools are actually not reliable.

For example, such a school is unreliable, and you have to worry about life danger when you go to school:
Comic: What is MapReduce?

Such a school is not reliable, and has to be electrotherapy every day:
Comic: What is MapReduce?

After years of experience, there are only five technical training schools that can really enter the small gray eye. Today, Xiao Hui introduces one of them: ItStar Academy.

Comic: What is MapReduce?

ItStar Ruistar Academy cooperates with Alibaba Cloud to focus on middle and high-end IT education courses, mainly offering big data, machine learning, AI, cloud computing, java, front-end, operation and maintenance and other related courses. All teachers are from BAT and other China and even the world Top first-line engineers.

Recently, ItStar Academy prepared a series of free big data open class live broadcasts for us. This lecture is specially combined with Alibaba Cloud partners, 11 years of experience in big data project development, and senior big data architect from Fortune 500 companies, Mr. Arry.
Comic: What is MapReduce?

If you scan it, you won’t get pregnant. Those who are interested or have questions about the direction of big data can scan the QR code for consultation.

Comic: What is MapReduce?

Finally, I wish my friends who are aspiring to become big data engineers, and all readers of Xiaohui, successfully achieve their dreams in the new year!

—————END—————

Comic: What is MapReduce?

Friends who like this article, please press and hold the picture to follow the subscription account programmer Xiaohui, and watch more exciting content

Comic: What is MapReduce?

Guess you like

Origin blog.51cto.com/14982143/2550783