Hadoop Illuminated——Chapter3 Why do I Need Hadoop?

2019-11-25

22:13:53

 

 GitHub:https://github.com/elephantscale/HI-labs/tree/master/hadoop-admin

On Github: https: //github.com/markkerzner

作者Github:https://github.com/sujee

英语好句:It is a minuscule token of thanks from both of us to the Hadoop community

This is a trivial symbol of our gratitude to two people Hadoop community


 

Why do I Need Hadoop?

3.1. Hadoop provides storage for Big Data at reasonable cost

1.Cloudera One study showed that companies typically spend $ 25,000 and $ 50,000 per unit

Terabytes per year.
With Hadoop, the annual decline in the cost per terabyte to several thousand dollars.
with

Hardware become cheaper, the costs continued to decline.

3.2. Hadoop allows to capture new or more data

2. A data capture Sometimes organizations do not, because the cost of storing it too high.
since

Hadoop storage provided at a reasonable cost, you can capture and store such data.

One example is the site click on the log.
As the number of these logs can be high, so a small number

Captured these organizations.
Now, with Hadoop, you can capture and store logs

3.3. With Hadoop, you can store data longer

In order to manage the amount of data storage, the company will remove older data on a regular basis.
For example, only the log

Logs can be stored within the last three months, and delete older log.
Use Hadoop, can be stored

Historical data longer.
This allows a new analysis of the older historical data.

For example, click the log acquired from the site.
A few years ago, these logs are stored for a very short period of time

It's time to calculate a popular web pages and other static data.
Now use Hadoop, you can store these click log

Longer

3.4. Hadoop provides scalable analytics

If we can not analyze them, then store all the data meaningless.
Hadoop distributed not only provide

Storage, or distributed processing.
This means that we can parallel processing of large amounts of data.

Hadoop computational framework called Map Reduce.
We have demonstrated the scale of Map Reduce

Peta bytes.

3.5. Hadoop provides rich analytics

Native Map Reduce support Java as the primary programming language.
Other languages, such as Ruby, Python

And R may also be used.

Of course, writing custom code is the only way to Hadoop Map Reduce the data is not analyzed.
Higher level

You can use Map Reduce.
For example, the tool can be called the Pig data flow languages like English and translation

They enter the Map Reduce.
Another tool Hive, accepts SQL queries and run them using Map Reduce.

Business Intelligence (BI) tools can provide a higher level of analysis.
Some BI tools can work

Hadoop and analysis of data stored in Hadoop.
For supported Hadoop BI tools list, see this

This chapter: Chapter 13, business intelligence tools for Hadoop and big data [52]


 

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/JasonPeng1/p/11931444.html