2019-11-25
22:13:53
GitHub:https://github.com/elephantscale/HI-labs/tree/master/hadoop-admin
On Github: https: //github.com/markkerzner
作者Github:https://github.com/sujee
英语好句:It is a minuscule token of thanks from both of us to the Hadoop community
This is a trivial symbol of our gratitude to two people Hadoop community
Why do I Need Hadoop?
3.1. Hadoop provides storage for Big Data at reasonable cost
1.Cloudera One study showed that companies typically spend $ 25,000 and $ 50,000 per unit
Terabytes per year.
With Hadoop, the annual decline in the cost per terabyte to several thousand dollars.
with
Hardware become cheaper, the costs continued to decline.
3.2. Hadoop allows to capture new or more data
2. A data capture Sometimes organizations do not, because the cost of storing it too high.
since
Hadoop storage provided at a reasonable cost, you can capture and store such data.
One example is the site click on the log.
As the number of these logs can be high, so a small number
Captured these organizations.
Now, with Hadoop, you can capture and store logs
3.3. With Hadoop, you can store data longer
In order to manage the amount of data storage, the company will remove older data on a regular basis.
For example, only the log
Logs can be stored within the last three months, and delete older log.
Use Hadoop, can be stored
Historical data longer.
This allows a new analysis of the older historical data.
For example, click the log acquired from the site.
A few years ago, these logs are stored for a very short period of time
It's time to calculate a popular web pages and other static data.
Now use Hadoop, you can store these click log
Longer
3.4. Hadoop provides scalable analytics
If we can not analyze them, then store all the data meaningless.
Hadoop distributed not only provide
Storage, or distributed processing.
This means that we can parallel processing of large amounts of data.
Hadoop computational framework called Map Reduce.
We have demonstrated the scale of Map Reduce
Peta bytes.
3.5. Hadoop provides rich analytics
Native Map Reduce support Java as the primary programming language.
Other languages, such as Ruby, Python
And R may also be used.
Of course, writing custom code is the only way to Hadoop Map Reduce the data is not analyzed.
Higher level
You can use Map Reduce.
For example, the tool can be called the Pig data flow languages like English and translation
They enter the Map Reduce.
Another tool Hive, accepts SQL queries and run them using Map Reduce.
Business Intelligence (BI) tools can provide a higher level of analysis.
Some BI tools can work
Hadoop and analysis of data stored in Hadoop.
For supported Hadoop BI tools list, see this
This chapter: Chapter 13, business intelligence tools for Hadoop and big data [52]