Big data framework Hadoop ecosystem Episode

1.4 Hadoop releases

Although Hadoop is open source Apache (and now GitHub) project, but the Hadoop industry, still appeared a large number of new companies, to help people more easily use Hadoop as the goal. Most of these companies will be packaged Hadoop distribution, improve, work together to ensure that all software and provide technical support. Now, Apache is also developing its own more tools to simplify the use of Hadoop, and to extend its functionality. These tools are proprietary, and vary. Some tools become the basis for a new family of Apache Hadoop project. Among them, some of which are licensed through Apache2 GitHub open source project. While all of these companies are based on Apache Hadoop distributions, but they all have a vision of Hadoop with slightly different - should choose which direction, how to complete it.

The biggest difference between these companies are: using the Apache source code. In addition to MapR company, all that should be defined by the source Apache Hadoop project. Instead, MapR just think Apache code reference implementation, based API Apache can provide to fulfill their needs. This approach makes MapR made a great innovation, especially in terms of HDFS and HBase, MapR storage mechanism so that these two basic Hadoop more reliable, more high performance. MapR also launched a high-speed network file system (NFS), you can access HDFS, which greatly simplifies the integration of a number of enterprise applications.

There are two higher degree of concern Hadoop distribution, were released by the Amazon and Microsoft. Both provide pre-installed version of Hadoop, running on the corresponding cloud service platform (Amazon or Azure), providing PaaS services. They offer an extension service that allows developers to take advantage not only of local HDFS Hadoop, HDFS can also take advantage of Microsoft and Yahoo mapping data storage mechanism (Amazon's S3, Azure and the Windows Azure storage mechanism). Amazon also offers save and restore functions HBase content in S3 above.

Table 1-1 shows the main features of the major Hadoop release.

Table 1: Different Hadoop vendors

Of course, a large number of releases make you wonder when the company / department decided to adopt a specific version, the following points should be considered "Which distro should I use?":

Technical details - including Hadoop version, components, like proprietary features.

Easy to deploy - use the toolbox to achieve the deployment, upgrade management, patching and so on.

Easy to maintain - including support for cluster management, multi-center, disaster recovery support, and so on.

Cost - including needle release implementation costs, billing models and licenses.

Enterprise Integration Support - Hadoop in the enterprise application integration with other parts.

The choice depends on the version you intend to use Hadoop to solve what problems. The book version has nothing to do with the discussion, because I see is the value provided by each release.

Highly recommended reading articles

40 + annual salary of big data development [W] tutorial, all here!

Zero-based Big Data Quick Start Tutorial

Java Basic Course

web front-end development based tutorial

Big Data engineers must understand the concept of the seven

The future of cloud computing and big data Five Trends

How to quickly build their own knowledge of large data

Guess you like

Origin blog.csdn.net/chengxvsyu/article/details/92430886