Install hadoop cluster on CentOS 7

Environment : Two CentOS 7 virtual machines (192.168.31.224, 192.168.31.225), Hadoop-2.8.0, jdk 1.8

Implement password-free SSH login for two virtual machines

1. Modify the host name, 192.168.31.224 (hserver1), 192.168.31.225 (hserver2)

在主机192.168.31.224上执行
>hostname hserver1

在主机192.168.31.225上执行
>hostname hserver2

2. Modify the /etc/hosts files of hosts 224 and 225 and add at the end of the hosts file

192.168.31.224 hserver1
192.168.31.225 hserver2

write picture description here
3. Test: Execute ping hserver1 and ping hserver2 on the two hosts respectively, and the hosts can be connected.
write picture description here

4. Generate SSH key files for both machines

ssh-keygen -t rsa -P ''

write picture description here

Check that there will be two files under /root/.ssh

ls /root/.ssh

write picture description here
5. Generate authorized_keys files in the /root/.ssh directory of the two hosts respectively. The content is merged with id_rsa.pub in the .ssh directory of the 224 host and id_rsa.pub in the .ssh directory of the 225 host. If there are more hosts, you can create it on one and distribute it to the /root/.ssh directory of other hosts.
write picture description here
6. Test whether the ssh password-free configuration is successful, execute ssh hserver2 on the hserver1 host, and execute ssh hserver1 on hserver2, as shown in the figure below, indicating that the configuration is successful. Use exit to log out of the remote ssh connection.
write picture description here

Questions:
1. After the installation is completed, use ssh connection test, an error is reported: sign_and_send_pubkey: signing failed: agent refused operation
write picture description here

write picture description here
Workaround:
execute

eval "$(ssh-agent -s)"
ssh-add

Install and configure hadoop

1. Download hadoop 2.8.0 .
2. Put the tar package in the opt directory and decompress it.
3. Create a directory

mkdir  /root/hadoop  
mkdir  /root/hadoop/tmp  
mkdir  /root/hadoop/var  
mkdir  /root/hadoop/dfs  
mkdir  /root/hadoop/dfs/name  
mkdir  /root/hadoop/dfs/data 

4. Modify the configuration file a under /opt/hadoop-2.8.0/etc/hadoop under the hserver2 host
, modify the core-site.xml, and replace the core-site.xml of the hserver1 host with the modified core-site.xml . Set fs.default.name to locate the namenode of the file system

<configuration></configuration>中加入
----------
 <property>
        <name>hadoop.tmp.dir</name>
        <value>/root/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
   </property>
   <property>
        <name>fs.default.name</name>
        <value>hdfs://hserver2:9000</value>
   </property>

b. Modify hdfs-site.xml, dfs.replication HDFS backup parameters.

在<configuration></configuration>中加入
----------
<property>
   <name>dfs.name.dir</name>
   <value>/root/hadoop/dfs/name</value>
   <description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
   <name>dfs.data.dir</name>
   <value>/root/hadoop/dfs/data</value>
   <description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
   <name>dfs.replication</name>
   <value>2</value>
</property>
<property>
      <name>dfs.permissions</name>
      <value>false</value>
      <description>need not permissions</description>
</property>

c. Create and modify mapred-site.xml. The mapred.job.tracker parameter is used to locate the master node where the JobTracker is located.

cp ./mapred-site.xml.template ./mapred-site.xml

<configuration></configuration>中加入
----------
<property>
    <name>mapred.job.tracker</name>
    <value>hserver2:49001</value>
</property>
<property>
      <name>mapred.local.dir</name>
       <value>/root/hadoop/var</value>
</property>
<property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
</property>

d. Modify slaves, delete localhost, and add in it

hserver1
hserver2

e. Modify yarn-site.xml

<configuration></configuration>中加入
----------
<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hserver2</value>
   </property>
   <property>
        <description>The address of the applications manager interface in the RM.</description>
        <name>yarn.resourcemanager.address</name>
        <value>${yarn.resourcemanager.hostname}:8032</value>
   </property>
   <property>
       <description>The address of the scheduler interface.</description>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>${yarn.resourcemanager.hostname}:8030</value>
   </property>
   <property>
        <description>The http address of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>${yarn.resourcemanager.hostname}:8088</value>
   </property>
   <property>
        <description>The https adddress of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.https.address</name>
        <value>${yarn.resourcemanager.hostname}:8090</value>
   </property>
   <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>${yarn.resourcemanager.hostname}:8031</value>
   </property>
   <property>
        <description>The address of the RM admin interface.</description>
        <name>yarn.resourcemanager.admin.address</name>
        <value>${yarn.resourcemanager.hostname}:8033</value>
   </property>
   <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
   </property>
   <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>2048</value>
        <discription>每个节点可用内存,单位MB,默认8182MB</discription>
  </property>
  <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
  </property>
 <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value>
</property>
<property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
</property>

f. Modify /opt/hadoop-2.8.0/etc/hadoop/hadoop-env.sh (The system has set the JAVA_HOME environment variable, but if it is not changed here, an error will be reported when starting: Error: JAVA_HOME is not set and could not be found.)
Change export JAVA_HOME=${JAVA_HOME} to export JAVA_HOME=/opt/jdk1.8.0_161

start hadoop

1. Enter the /opt/hadoop-2.8.0/bin directory of the hserver2 host and format HDFS

 ./hadoop  namenode  -format

write picture description here
No error is reported, indicating that the initialization was successful.
2. View the current directory generated under the /root/hadoop/dfs/name/ directory, and generate some files under the current directory.
write picture description here
3. Start hadoop on the namenode, and the host 192.168.31.225 (hserver2) is the namenode. Enter /opt/hadoop-2.8.0/sbin to execute

./start-all.sh 

write picture description here
write picture description here
4. Test, visit http://192.168.31.225:50070/
write picture description here
visit: http://192.168.31.225:8088
write picture description here
5. Stop

./stop-all.sh

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325564262&siteId=291194637