Hadoop environment construction teaching (1) Operating environment, introduction to cluster planning;


Preface

The operating environment of Hadoop can be on Windows or linux, but the efficiency of running on Windows is very low;

The following describes how to build Hadoop in the Linux environment;

1. Three operating environments of Hadoop

The Hadoop operating environment has three modes: local mode, pseudo-distributed mode, and fully distributed mode

Standalone Mode

默认情况下,Hadoop即处于该模式,用于开发和调式。【不推荐使用】
  1. Do not modify the configuration file.
  2. Use a local file system instead of a distributed file system.
  3. Hadoop does not start NameNode, DataNode, SecondaryNameNode, ResourceManager, NodeManager and other daemon processes. Map() and Reduce() tasks are executed as different parts of the same process.
  4. It is used to debug the logic of the MapReduce program to ensure the correctness of the program.

Pseudo-Distributed Mode

Hadoop的守护进程运行在本机机器,模拟一个小规模的集群。【电脑配置不高的可以使用】 
  1. Simulate a small-scale cluster on a host. In this mode, Hadoop uses a distributed file system, which is generally used for program debugging and testing. It can also be said that pseudo-distribution is a special case of completely distributed.
  2. In this mode, Hadoop uses a distributed file system. The code debugging function is added to the stand-alone mode, allowing to check memory usage, HDFS input and output, and daemon interaction. Hadoop starts NameNode, DataNode, SecondaryNameNode, ResourceManager, NodeManager. These daemon processes all run on the same machine and are independent Java processes.
  3. Need to modify the configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml.
  4. Format the file system

Fully distributed mode (Cluster Mode) <The environment built this time>

Hadoop的运行在由多台主机搭建的集群上,是真正的生产环境。【电脑内存8G及以上就可以配置】
  1. In this mode, Hadoop installs JDK, Hadoop, Zookeeper and other software on all hosts to form a connected network.
  2. Set up SSH password-free login between hosts, and add the public key generated by each slave node to the trust list of the master node.
  3. Need to modify the configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, hadoop-env.sh. Format the file system
  4. Process after starting Hadoop: NameNode, DataNode, ResourceManager, NodeManager, SecondaryNameNode
    Insert picture description here

2. Cluster planning

Insert picture description here

3. Basic software installation required

  1. VMware installation
    You can download by yourself due to copyright reasons; VM16 is fine
  2. Linux system installation
    uses CentOs7,
    you can refer to https://blog.csdn.net/weixin_45556441/article/details/114382989
    resource download https://download.csdn.net/download/weixin_45556441/15676799
  3. Remote connection tool installation
    Directly install the green version of Xshell.
    Installation address https://download.csdn.net/download/weixin_45556441/15676579

See you next issue

Hadoop environment construction teaching (2) Completely distributed cluster construction;

Guess you like

Origin blog.csdn.net/weixin_45556441/article/details/114678837