The first step, configure SSH password-free login
ssh-keygen -t rsa Generate private and public key files (press Enter 3 times) ssh-copy-id localhost sends a public key to localhost ssh localhost login with public key password
/* If the server's IP is 10.10.10.10 The ip of my local computer is 111.111.111.111 */ ssh-copy-id 111.111.111.111 /*The server (10.10.10.10) sends me a public key (111.111.111.111)*/ ssh 10.10.10.10 /* I use the public key password to log into the server */The second step, Hadoop installation (using pseudo-distributed):
/* Below we use 3 servers to implement a distributed database, the configuration is as follows: */ sudo vim /etc/hosts edit hosts file
Add the following to the hosts file:
192.168.40.128 node1 192.168.40.128 is the IP of the corresponding server 192.168.40.129 node2 192.168.40.130 node3Steps to install hadoop-2.6.0.tar.gz:
1. First upload and copy hadoop-2.6.0.tar.gz to the main directory of sa
②, tar -xvf hadoop-2.6.0.tar.gz decompress, then rename to hadoop
The third step, configure hadoop environment variables
vi /etc/profile edit environment variable file
Add the following at the bottom of the profile:
#set hadoop environment export HADOOP_HOME=/home/sa/hadoop export PATH=$HADOP_HOME/bin:$PATH
source /etc/profile to make the current file take effect
The fourth step, enter the hadoop/etc/hadoop directory, modify the 7 configuration files, as follows:
①、hadoop-env.sh
export JAVA_HOME=/home/sa/jdk7
②、yarn-env.sh
export JAVA_HOME=/home/sa/jdk7③、slaves
<!--As many servers as distributed databases, add as many nodes--> node1 node2 node3
④, core-site.xml
(Hadoop global configuration)
<configuration> <!--Specify the address of the namenode to access the hdfs database--> <property> <name>fs.defaultFS</name> <value>hdfs://node1:9000</value> </property> <!--Used to specify the storage directory for temporary files generated when using hadoop--> <property> <name>hadoop.tmp.dir</name> <value>file:/home/sa/hadoop/tmp</value> </property> <!--Used to set the maximum time for the checkpoint backup log--> <name>fs.checkpoint.period</name> <value>3600</value> </configuration>⑤, hdfs-site.xml (HDFS configuration)
<configuration> <!--Specify the HTTP address of the namenode server in hdfs--> <property> <name>dfs.namenode.secondary.http-address</name> <value>node1:50090</value> </property> <!--Specify the storage location of the namenode in hdfs--> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/sa/hadoop/dfs/name</value> </property> <!--Specify the storage location of datanode in hdfs--> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/sa/hadoop/dfs/data</value> </property> <!--Specify the number of copies of hdfs to save data--> <property> <name>dfs.replication</name> <value>2</value> </property> <!--Whether to enable web page function--> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>⑥, mapred-site.xml (Analyst MapReduce configuration)
<configuration> <!--Tell hadoop to run MapReduce on YARN in the future (framework name used by analysts)--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!--Analyst work history address--> <property> <name>mapreduce.jobhistory.address</name> <value>nonde1:10020</value> </property> <!--View the analyst's historical work records on the web page--> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>nonde1:19888</value> </property> </configuration>⑦, yarn-site.xml (configuration of yarn framework)
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>node1:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>node1:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>node1:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>node1:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>node1:8088</value> </property> </configuration>Step 5. Format the NameNode on the main server (Master) (verify that the Hadoop configuration is correct)
cd ~/hadoop bin / hdfs purpose -formatStep 6. Start Hadoop
cd ~hadoop (Note: The command must be executed in the hadoop installation directory)
①, first, start HDFS
sbin/start-dfs.sh②, then, start YARN
sbin/start-yarn.shYou can also start (or stop) HDFS and YARN at the same time with the following command
sbin/start-all.sh start all sbin/stop-all.sh stop all3. Check the cluster status to see if Hadoop is successfully started
bin/hdfs dfsadmin -report