What are the different versions of hadoop

1. What is Hadoop?

When I heard the word hadoop for the first time, I believe that many people were the same as me at the time, and I couldn't help but draw a big question mark in my heart - what is this? What is Hadoop? Baidu Encyclopedia's explanation is: Hadoop is a distributed system infrastructure developed by the Apache Foundation. In other words, hadoop is a software framework capable of distributed processing of large amounts of data.

The so-called birth of Hadoopd is mainly due to the huge amount of data that computers need to process in the era of big data. At this time, it is necessary to divide and distribute these huge data to N computers for processing. When a large amount of information is distributed to different computers for processing, to ensure that the final result is correct, it is necessary to manage the distributed processing information. Hadoop is such a set of solutions.

Let 's illustrate with a simple and popular example: if you have a basket of fruits and you want to know the number of apples and pears, then you can know how many by just counting them one by one. If you have a container of fruit, you need a lot of people to help you count at the same time, which is equivalent to multi-process or multi-thread. If you have many containers of fruit, then you need distributed computing, which is Hadoop.

Second, the version of hadoop

With the rise of the wave of big data in recent years, various versions of hadoop have also been rapidly circulated and used in China. The current major hadoop versions are as follows:

1. The 2.0 version of Apache hadoop has the following modules:

(1) Hadoop general module, which supports the general toolset of other hadoop modules;

(2) Hadoop distributed file system, which supports distributed file systems corresponding to high-throughput access to data;

(3) Hadoop YANRN framework for job scheduling and cluster resource management;

(4) Hadoop MapReduce, a big data parallel processing system based on YARN.

2. Cloudera hadoop: Cloudera has a clearer version hierarchy, and it provides Hadoop installation packages for various operating systems, which can be installed directly using the apt-get or yum command, which is more trouble-free.

3. Hortonworks: Hortonworks' flagship product is Hortonworks Data Platform (HDP), which is also a 100% open source product. In addition to common projects, HDP also includes Ambari, an open source installation and management system. HCatalog, a metadata management system, is now integrated into Facebook's open source Hive. Stinger from Hortonworks has pioneered and greatly optimized Hive projects. Hortonworks provides a really nice, easy-to-use sandbox for getting started. Hortonworks developed many enhancements and committed to the core trunk, which enables Apache Hadoop to run natively on Microsoft Windows platforms including Windows Server and Windows Azure.

3. What are the domestic hadoop distributions?

Domestic Hadoop distributions like Huawei and Dakuai Search have launched their own distributions. Huawei has natural problems in hardware. Huawei's FusionInsight Hadoop version is based on Apache Hadoop and builds the HA function of NameNode, JobTracker, and HiveServer. After a process failure, the system automatically fails over without manual intervention. This is also a small repair to Hadoop, far less than MapR. Completely resolved.

DKhaoop launched by Da Kuai Search is the only pure original ecological development among the known domestic distributions. It integrates all components of the entire HADOOP ecosystem, and is deeply optimized and recompiled into a complete higher-performance big data The general computing platform realizes the organic coordination of various components. Therefore, compared with the open source big data platform, DKH has a performance improvement of up to 5 times (maximum) in computing performance.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325557630&siteId=291194637