About 12 big data framework hadoop facts

Today, Apache Hadoop has no wonder known to everybody. Doug Cutting Yahoo search engineers then developed this open source software library for creating a distributed computer environment, and elephants doll named his son for the time, who would have thought that one day it will occupy the first "big data" technology chair it.

 

Although Hadoop big data along with hot up, but I believe there are still many users do not understand for it. TDWI Solutions Summit last week in the name of, director of TDWI Research industry analyst Philip Russom published "on Hadoop, 12:00 fact," the keynote speech following is a summary of the contents of the essence, you want to learn more about Hadoop has help.

 

1, Hadoop is comprised of multiple products of

 

When people talk about Hadoop, and often think of it as a single product to look at, but in fact it is composed of a plurality of different products composed.

 

Russom said: "Hadoop is a combination of a series of open source products, these products are the Apache Software Foundation project."

 

Mention of Hadoop, MapReduce and people tend to be put together, but in fact, like MapReduce and HDFS, Hadoop is also the basis of.

 

2, Apache Hadoop is an open source, but proprietary vendors offer products Hadoop

 

Since Hadoop belongs to the open source technology available for free download, so IBM, Cloudera, and EMC Greenplum and other manufacturers can launch their own special release Hadoop.

 

These special distributions will generally have some additional features, such as senior management support tools and related maintenance services. Some may scoff: Since the open source community is free, so why should we pay for its services Russom explained that these versions of HDFS is more appropriate for some IT departments, especially the enterprise IT system has been relatively mature user?.

 

3, Hadoop is an ecosystem, rather than a product

 

Hadoop is jointly developed and promoted by the open source community and various manufacturers. Specifically, the manufacturer of the product structure of Hadoop and relational stronger.

 

Russom said: "The platform has been reporting, data integration platform is no exception in offering a variety of interfaces for the updated platform, Hadoop."

 

4, HDFS file system, rather than a database management system

 

Russom most intolerable is that people often confuse the two. It can manage the data set is one very important characteristic data management system that HDFS is not available.

 

Database management systems, we can achieve by querying the index random access to data, it is often dealing with structured data in Hadoop and will not handle such data types.

 

5, Hive SQL-like, it is not standard SQL

 

Most of the traditional data acquisition tools are SQL-based business, which is relatively a headache, because Hadoop uses a SQL-like language but not SQL --Apache Hive and HiveQL.

 

Russom said: "I often hear people say, but that does not solve the fundamental tool is compatible with SQL problem 'Hive is very simple to learn, learn Hive directly on the line.'."

 

Russom think compatibility is only a short-term problem, but hindered the popularity of Hadoop.

 

6, Hadoop MapReduce and interrelated, but are not interdependent

 

MapReduce launched as early as before the emergence of HDFS developed by Google. In addition, a class manufacturers such as MapR has been the diversity of publicity MapReduce functions without HDFS support.

 

Nevertheless, Russom thinks they have a good complementary. Most are reflected in the value of HDFS may be laminated onto the tool distributed file system.

 

7, MapReduce provides a control for analysis, rather than the analysis itself

 

MapReduce is a universal execution engine driven, big data analytics can help. Handwritten code which reads the data subjected to the automatic parallel processing, and mapping the result to a single collection. However, we need to be clear, MapReduce itself does not carry out analytical work.

 

Russom said: "MapReduce can be seen as an upgraded version of the MPP architecture no matter what you write code that can be parallelized them, very powerful.."

 

8, Hadoop significance lies not only in the amount of data, but also because of the diverse data

 

Some people Hadoop classified as mass data processing technology Hadoop but the real value is the ability of diverse data processing.

 

Russom said: "Hadoop processing range for most data warehouse is less than, for example, totally unstructured data and for semi-structured."

 

9, Hadoop is complementary data warehouse, data warehouse is not a substitute for

 

Hadoop ability to manage diverse data types makes the "data warehouse will die" speech everywhere, but Russom has been refuted.

 

He asked: "In the IT field, how often people replace a technology almost never?."

 

Data warehouse performance in its domain are still outstanding, Hadoop can play the role of data warehouse technology to add. Schema data warehouse and other systems increasingly began to move closer to a distributed, Hadoop here will play its role.

 

10, Hadoop is not just Web Analytics

 

Hadoop use of the Internet is very common, Russom think partly because of the popularity of Hadoop trends because it can handle more types of analysis.

 

Russom cited the example of the railway company, robots and retail. Railway companies can use the sensor to track vehicles abnormally high temperature detection, to prevent accidents.

 

Russom Although very optimistic about the prospects of Hadoop, but also believes that its popularity still take several years.

 

11, non-Hadoop big data is not necessarily unavailable

 

Do not look now Big Data and Hadoop has been inextricably linked, Russom thought that "only" Hadoop is not a big data. He mentioned a number of other vendors' products, such as Teradata, Sybase IQ (acquired by SAP) and Vertica (acquired by HP) and so on.

 

In addition, in the absence of birth Hadoop, some companies have begun research on a big data. For example, the telecommunications industry for many years before there call detail records.

 

12, Hadoop is not a "free lunch"

 

Although Hadoop belongs to the open source technology, but the installation software deployment is the need to spend money. Russom said that due to lack of Hadoop management tools and support services, enterprises can easily lead to additional costs during use. In addition, because it does not optimize the program, we can only ask professional handwriting input code at runtime environments, and pay the price for these professionals have a lot of money.

Authors strongly recommend reading the article:

Big Data engineers must master the open source tools summary

Big Data senior teach you how to read a large data core technology

Top Big Data engineers need to master the skills

8 big factor data, machine learning and artificial intelligence for future development

Guess you like

Origin blog.csdn.net/sdddddddddddg/article/details/91348178
Recommended