What the hell is Hadoop

189


  

hadoop

 

  So we are full of mission, it's time to stand up and explain!!

  1 Built on the background of big data

  Of course, to explain what Hadoop is, you have to start with big data. More than 20 years ago, in the 1990s, a large amount of data was generated (it is not that there was not so much data before, but because of science and technology, these data in daily life were fleeting and not recorded by people) How exaggerated is this "mass generation", the current amount of data is equivalent to hundreds of thousands of times the previous amount of data!

  The rapid growth of data is bound to bring some problems. Let's do a 3rd grade application question first. Please listen to the question:

  The amount of data in the 1990s is equivalent to 10 parts. A child walks every minute to move one part, and it takes 10 minutes to remove these parts; after the 1990s, the amount of data is equivalent to 10,000 parts, and this child has grown up. , he can move 4 parts in 1 minute walk, so how long does it take to remove these parts?

  The answer is 2500 minutes!

  In other words, the development of data reading technology cannot keep up with the growth rate of data volume!

  So we are clever and use distributed - the core idea of ​​​​the entire Hadoop.

  2 Using distributed solutions to solve problems with limited individual capabilities

  What is distributed? A very simple reason, we don't need to train a strong man who can move 100 parts in one minute, that's not realistic. One person is too slow to move parts. We can hire 10 people. Just invite 100 people, 1000 people, this is the so-called distributed.

  But with the problem of increasing the number of parts, how to deal with so many parts?

  3 Hadoop core design: HDFS and MapReduce

  We first have to assign these parts. In the era of big data, we are faced with data in units of TB, PB or even EB. Therefore, we need to establish a file management system that can not only store such a large amount of data, but also read and write files at high speed and efficiency - HDFS . HDFS, also known as Hadoop Distributed File System, distributes a huge file to multiple storage devices and cooperates with a scheduler to manage these files. So how does HDFS work? Let's listen to a story first. The owner of a parts factory (client client) has a large number of parts to store. However, a single warehouse simply cannot store so many parts. So the boss thought of establishing a warehouse cluster (HDFS), storing his parts in different warehouses (host) in batches, and then establishing a management system covering all warehouses.

  

flow chart

 

  After the files are stored in HDFS, we need to consider how to use the data. People often dig out the potential value in the data through the association between the data, and the disorganized data will greatly hinder the data mining. At this time, a programming model needs to be established to sort and organize data, which is another core of Hadoop - Mapreduce. Let's look at another story:

story

 

  In general, HDFS is the storage basis of Hadoop, at the data level, and provides a method for storing massive data (distributed storage). MapReduce is an engine or a programming model, which can be understood as the upper layer of data. We can perform computing processing (distribution statistics integration) on massive data in HDFS by writing MapReduce programs. This is similar to how we find the results we want by doing MapReduce (reading) all the files (HDFS) and doing statistics. So Hadoop is a tool that can help us store a large amount of data and process it. (It seems that there are many more nouns....)

  In fact, HDFS and MapReduce are only the most basic parts of Hadoop (the rest will be mentioned in subsequent articles). In the ten years since Hadoop was born in 2006, it has undergone several updates and developed a variety of extended functions. Various companies developing products based on Hadoop have spread all over the world, and there are countless cases of Hadoop technology application. So I want to tell you: don't imagine Hadoop out of reach, Hadoop has already become a part of our lives.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325810049&siteId=291194637