Hadoop CDH and Apache pseudo-distributed deployment

1. Hadoop distribution

At present, there are many Hadoop distributions, including Intel distribution, Huawei distribution, Cloudera distribution (CDH), Hortonworks version, etc. All these distributions are derived from Apache Hadoop. The reason why there are so many versions is because Apache Hadoop's open source agreement determines that anyone can modify it and release/sell it as an open source or commercial product.

Currently, there are three free Hadoop versions, all of which are from foreign manufacturers. They are:

  • Apache (the most original version, all distributions are based on this version to improve);
  • Cloudera version (Cloudera's Distribution Including Apache Hadoop, referred to as CDH);
  • Hortonworks version (Hortonworks Data Platform, referred to as "HDP");

For domestic users, most of them choose the CDH version. The difference between Cloudera's CDH and Apache's Hadoop is as follows:

  1. CDH has a very clear division of Hadoop versions. So far, there are 5 versions of CDH, of which the first three are no longer updated, and the latest two are CDH4 and CDH5. CDH4 is based on Hadoop2.0, and CDH5 is based on hadoop2. 2/2.3/2.5/2.6. In comparison, the Apache version is much more confusing; at the same time, the CDH release version has greatly enhanced compatibility, security, and stability compared to Apache hadoop.  
  2. CDH3 is the third version of CDH, which is improved based on Apache hadoop0.20.2&#x

Guess you like

Origin blog.csdn.net/qq_35029061/article/details/132252414