Environment : Two CentOS 7 virtual machines (192.168.31.224, 192.168.31.225), Hadoop-2.8.0, jdk 1.8
Implement password-free SSH login for two virtual machines
1. Modify the host name, 192.168.31.224 (hserver1), 192.168.31.225 (hserver2)
在主机192.168.31.224上执行
>hostname hserver1
在主机192.168.31.225上执行
>hostname hserver2
2. Modify the /etc/hosts files of hosts 224 and 225 and add at the end of the hosts file
192.168.31.224 hserver1
192.168.31.225 hserver2
3. Test: Execute ping hserver1 and ping hserver2 on the two hosts respectively, and the hosts can be connected.
4. Generate SSH key files for both machines
ssh-keygen -t rsa -P ''
Check that there will be two files under /root/.ssh
ls /root/.ssh
5. Generate authorized_keys files in the /root/.ssh directory of the two hosts respectively. The content is merged with id_rsa.pub in the .ssh directory of the 224 host and id_rsa.pub in the .ssh directory of the 225 host. If there are more hosts, you can create it on one and distribute it to the /root/.ssh directory of other hosts.
6. Test whether the ssh password-free configuration is successful, execute ssh hserver2 on the hserver1 host, and execute ssh hserver1 on hserver2, as shown in the figure below, indicating that the configuration is successful. Use exit to log out of the remote ssh connection.
Questions:
1. After the installation is completed, use ssh connection test, an error is reported: sign_and_send_pubkey: signing failed: agent refused operation
Workaround:
execute
eval "$(ssh-agent -s)"
ssh-add
Install and configure hadoop
1. Download hadoop 2.8.0 .
2. Put the tar package in the opt directory and decompress it.
3. Create a directory
mkdir /root/hadoop
mkdir /root/hadoop/tmp
mkdir /root/hadoop/var
mkdir /root/hadoop/dfs
mkdir /root/hadoop/dfs/name
mkdir /root/hadoop/dfs/data
4. Modify the configuration file a under /opt/hadoop-2.8.0/etc/hadoop under the hserver2 host
, modify the core-site.xml, and replace the core-site.xml of the hserver1 host with the modified core-site.xml . Set fs.default.name to locate the namenode of the file system
在<configuration></configuration>中加入
----------
<property>
<name>hadoop.tmp.dir</name>
<value>/root/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hserver2:9000</value>
</property>
b. Modify hdfs-site.xml, dfs.replication HDFS backup parameters.
在<configuration></configuration>中加入
----------
<property>
<name>dfs.name.dir</name>
<value>/root/hadoop/dfs/name</value>
<description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/root/hadoop/dfs/data</value>
<description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>need not permissions</description>
</property>
c. Create and modify mapred-site.xml. The mapred.job.tracker parameter is used to locate the master node where the JobTracker is located.
cp ./mapred-site.xml.template ./mapred-site.xml
在<configuration></configuration>中加入
----------
<property>
<name>mapred.job.tracker</name>
<value>hserver2:49001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/root/hadoop/var</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
d. Modify slaves, delete localhost, and add in it
hserver1
hserver2
e. Modify yarn-site.xml
在<configuration></configuration>中加入
----------
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hserver2</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<discription>每个节点可用内存,单位MB,默认8182MB</discription>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
f. Modify /opt/hadoop-2.8.0/etc/hadoop/hadoop-env.sh (The system has set the JAVA_HOME environment variable, but if it is not changed here, an error will be reported when starting: Error: JAVA_HOME is not set and could not be found.)
Change export JAVA_HOME=${JAVA_HOME} to export JAVA_HOME=/opt/jdk1.8.0_161
start hadoop
1. Enter the /opt/hadoop-2.8.0/bin directory of the hserver2 host and format HDFS
./hadoop namenode -format
No error is reported, indicating that the initialization was successful.
2. View the current directory generated under the /root/hadoop/dfs/name/ directory, and generate some files under the current directory.
3. Start hadoop on the namenode, and the host 192.168.31.225 (hserver2) is the namenode. Enter /opt/hadoop-2.8.0/sbin to execute
./start-all.sh
4. Test, visit http://192.168.31.225:50070/
visit: http://192.168.31.225:8088
5. Stop
./stop-all.sh