Wu Yuxiong - natural born HADOOP performing experiments study notes: pseudo-distributed single-node installation

Purpose

Learn installation configuration of java

Learning configure their own password-free login node

Learn hdfs configuration and related commands

Learn yarn configuration

Principle

1.Hadoop install Hadoop installation for a beginner is a very difficult thing, step by installing the entire cluster hadoop particularly difficult, so a quick way to learn is to install the side while studying, installation time, first set up a single pseudo-distributed nodes, and then build a fully distributed, and finally build a highly available distributed cluster If you are interested, you can also use CDH research how to build very large scale clusters.   Before installing, you first need to understand a concept: hadoop has three parts, hdfs, mapreduce, yarn, which is a bunch of mapreduce java jar package, no installation, and hdfs and yarn are to be installed. A dummy node in a distributed installation, is actually the hdfs namenode, secondary namenode, datanode yarn and the nodemanager, resource manager on a single node.   To install in a pseudo-distributed Hadoop official online presentation, we'll introduce step by step in accordance with official website.  
  

First install java;

Configuring current users for their own password-free login;

Install Hadoop and configure profiles.

2.Hadoop profile Description:
   . 1, as soon dfs.hosts recording datanode join the cluster list of machines
  2, mapred.hosts recording tasktracker soon as the cluster join list of machines
  3, dfs.hosts.exclude mapred.hosts.exclude be respectively comprising remove the list of machines
  4, master record run auxiliary namenode list of machines
  5, slave records and tasktracker machines running datanode list
  6, hadoop-env.sh record scripts use environment variables to run hadoop
  7, Core-Site. xml hadoop core configuration items, such as commonly used mapreduce hdfs and i / o settings
  8, hdfs-site.xml hadoop daemon configuration items, including namenode, datanode and other auxiliary namenode
  9, mapred-site.xml mapreduce daemon configuration items, including jobtracker and tasktracker
  10, hadoop-metrics.properties control how the attribute metrics posted on hadoop
  11, log4j.properties system log files, namenode audit log, log tasktracker task child process property

3. Format HDFS file system
  before you can use, the new HDFS installation needs to be formatted. By creating storage directory and the initial version of namenode persistent data structure, formatting process will create an empty file system. Since namenode manage all file system metadata, datanode can dynamically join and leave the cluster. So the initial formatting process does not involve datanode. By the same token, when you create a file system need not specify the size, it is the number datanode there is a cluster of decisions, can be increased according to need in a very long time after the file system format.
  Formatting HDFS is a quick operation. Hdfs user identity to run the following command:

hdfs namenode -format

4. Start and stop the daemon
   Hadoop own script. And you can run commands to start and stop the daemon in the entire cluster-wide. To use these scripts (in the sbin directory), you need to tell the machine what Hadoop cluster. Official documents slaves for this purpose, the document contains a list of the machine host name or IP address, each row represents a machine information. File slaves listed datanode and Node Manager machine can run. Hadoop configuration file resides in the directory, although by modifying the settings might hadoop-env.sh in HADOOP_SLAVES file somewhere else (and given a different name). Also, you do not need to distribute the file to a working node, since only run control scripts on namenode and resource managers to use it.
  1) Run the following command as a user with HDFS HDFS can start the daemon:

start-dfs.sh

  Namenode machines and auxiliary namenode is running Hadoop configuration by asking the host name of the machine to decide. The following command script to find the host name of namenode.

hdfs getconf -namenodes

  By default, the command from the Find the host name of fs.defaultFS in namenode. More specifically, some of the things done start-dfs.sh script is as follows:

1. Start a namenode on each machine, these machines obtained by the execution hdfs getconf -namenodes return value determined.

2. Start a datanode slaves file on each machine listed.

3. Start a secondary namenode on each machine, these obtained by performing hdfs get conf-secondarynamenodes return value determined.

  2) YARN daemon is started in the same way, by running the following command on the managed resource manager machine as a yarn users:

start-yarn.sh

  In this case, the resource manager is always start-yarn.sh script runs on the consent of the machine. Clear script complete the following things:

1. Start a resource manager on the local machine.

2. Start a Node Manager on each machine slaves file listed.

  Likewise, it also provides stop-dfs.sh and stop-yarn.sh script used to stop daemon is started by the corresponding start script.

lab environment

1. Operating system
  manipulator 1: Linux_Centos
  manipulator 2: Windows_7
  manipulator 1 default user name: root, Password: 123456
  manipulator 2 default user name: hongya, Password: 123456

Step 1: Using the connection xshell

  1.1 into the handler 2, click xshell, create a new session.

Name: Standalone

Host: 90.10.10.42

 

1.2 Click "User Authentication", enter your user name and password, and click OK.

Username: root

Password: 123456

 

1.3 session has been established, then select and click "Connect." We can see a successful connection.

 

Step 2: Add Mapping, free password

  2.1 Add Mapping.

vi /etc/hosts

 

 

2.2 password-free login.

ssh-keygen

  Once entered, you will be prompted to create a .ssh / id_rsa, id_rsa.pub file, which is the first key, the second is a public key. The process will be asked to enter a password in order to access process without ssh password, you can directly enter.

  The public key is written standalone certification documents.

ssh-copy-id standalone

 

 

Step 3: Extract installation Hadoop, arranged Hadoop-env.sh

  3.1 extract the first installation package, and copy them to / home / hadoop / tmp / directory.

tar -zxvf /opt/pkg/hadoop-2.6.0-cdh5.5.2.tar.gz -C /home/hadoop/tmp/

 

 

3.2 configuration files into the directory, in the / etc / hadoop hadoop installation directory.

cd /home/hadoop/tmp/hadoop-2.6.0-cdh5.5.2/etc/hadoop

ls

 

 

3.3 hadoop-env.sh modified configuration.

vim hadoop-env.sh

  Modify the JDK path is:

export JAVA_HOME=/home/hadoop/tmp/soft/jdk1.8.0_121

 

Save and exit.

Step 4: Configure and start hdfs

  4.1 in the same directory, we modified core-site.xml configuration.

vim core-site.xml

  Add Content:

<property>

<!-- 配置集群的地址,这个standalone是本地ip -->

  <name>fs.defaultFS</name>

    <value>hdfs://standalone:9000</value>

    </property>

    <property>

    <!-- hadoop数据存放目录,修改为用户目录下不存在的目录-->

  <name>hadoop.tmp.dir</name>

    <value>/home/data/hadoop</value>

</property>

 

 

保存退出。

  4.2进入hadoop安装目录,如果是第一次启动,需要先格式化namenode(注意,namenode只能格式化一次,在bin目录下执行)

cd ..

  退回到Hadoop目录下

cd ..

  格式化namenode

bin/hdfs namenode -format

 

 

4.3在sbin目录下启动HDFS。

sbin/start-dfs.sh

  查看进程

jps

 

4.4等待启动完成后,就可以配置yarn了,在配置之前先去浏览器页面检查HDFS是否启动成功,输入namenode的ip+端口号50070。实验中操作机1 的IP为:90.10.10.42(以实验IP为准)。

输入URL:90.10.10.42:50070

 

 

出现这个页面代表hdfs安装成功。

步骤5:配置yarn并验证

  5.1进入配置目录etc/hadoop/,查看mapred配置文件,发现没有mapred-site.xml。

  进入安装目录

cd etc/hadoop

  查看

ls

  显示当前路径

pwd

 

 

5.2复制配置文件并命名为mapred-site.xml,然后修改配置文件。

cp mapred-site.xml.template mapred-site.xml

vim mapred-site.xml

  添加内容:

<property>

        <name>mapreduce.framework.name</name>

        <value>yarn</value>

</property>

 

 

 5.3修改yarn-site.xml的配置内容。

vim yarn-site.xml

  添加内容:

<!-- 指定mapreduce的shuffle-->

<property>

         <name>yarn.nodemanager.aux-services</name>

         <value>mapreduce_shuffle</value>

</property>

<!-- 指定resourcemanager的ip地址-->

<property>

        <name>yarn.resourcemanager.hostname</name>

        <value>standalone</value>

</property>

 

 

5.4启动yarn。

  进入Hadoop安装目录

cd ../..

  sbin目录下启动yarn命令

sbin/start-yarn.sh

 

 

5.5等待启动完成后,就可以去浏览器页面检查是否启动成功,输入刚刚节点的ip:8088,实验中操作机1 的IP为:90.10.10.42。

输入URL:90.10.10.42:8088

 

 

出现这个界面代表yarn启动成功。

 

Guess you like

Origin www.cnblogs.com/tszr/p/12164372.html