Hadoop Installation (Lesson 1)

The first step, configure SSH password-free login

ssh-keygen -t rsa Generate private and public key files (press Enter 3 times)
ssh-copy-id localhost sends a public key to localhost
ssh localhost login with public key password
/*
If the server's IP is 10.10.10.10
The ip of my local computer is 111.111.111.111
*/
ssh-copy-id 111.111.111.111 /*The server (10.10.10.10) sends me a public key (111.111.111.111)*/
ssh 10.10.10.10 /* I use the public key password to log into the server */
The second step, Hadoop installation (using pseudo-distributed):
/* Below we use 3 servers to implement a distributed database, the configuration is as follows: */
sudo vim /etc/hosts edit hosts file

Add the following to the hosts file:

192.168.40.128 node1 192.168.40.128 is the IP of the corresponding server
192.168.40.129 node2
192.168.40.130 node3
Steps to install hadoop-2.6.0.tar.gz:
    1. First upload and copy hadoop-2.6.0.tar.gz to the main directory of sa

    ②, tar -xvf hadoop-2.6.0.tar.gz decompress, then rename to hadoop

The third step, configure hadoop environment variables

vi /etc/profile edit environment variable file

Add the following at the bottom of the profile:

#set hadoop environment
export HADOOP_HOME=/home/sa/hadoop
export PATH=$HADOP_HOME/bin:$PATH
source /etc/profile to make the current file take effect

The fourth step, enter the hadoop/etc/hadoop directory, modify the 7 configuration files, as follows:

①、hadoop-env.sh

export JAVA_HOME=/home/sa/jdk7

②、yarn-env.sh

export JAVA_HOME=/home/sa/jdk7
③、slaves

<!--As many servers as distributed databases, add as many nodes-->
node1
node2
node3
④, core-site.xml (Hadoop global configuration)

<configuration>
        <!--Specify the address of the namenode to access the hdfs database-->
    <property>
                <name>fs.defaultFS</name>
                <value>hdfs://node1:9000</value>
    </property>
    <!--Used to specify the storage directory for temporary files generated when using hadoop-->
    <property>
             <name>hadoop.tmp.dir</name>
             <value>file:/home/sa/hadoop/tmp</value>
    </property>
        <!--Used to set the maximum time for the checkpoint backup log-->
        <name>fs.checkpoint.period</name>
        <value>3600</value>
 </configuration>
⑤, hdfs-site.xml (HDFS configuration)

<configuration>
    <!--Specify the HTTP address of the namenode server in hdfs-->
    <property>
             <name>dfs.namenode.secondary.http-address</name>
             <value>node1:50090</value>
    </property>
    <!--Specify the storage location of the namenode in hdfs-->
    <property>
             <name>dfs.namenode.name.dir</name>
             <value>file:/home/sa/hadoop/dfs/name</value>
    </property>
    <!--Specify the storage location of datanode in hdfs-->
    <property>
             <name>dfs.datanode.data.dir</name>
             <value>file:/home/sa/hadoop/dfs/data</value>
    </property>
    <!--Specify the number of copies of hdfs to save data-->
    <property>
            <name>dfs.replication</name>
            <value>2</value>
    </property>
   <!--Whether to enable web page function-->
    <property>
            <name>dfs.webhdfs.enabled</name>
            <value>true</value>
    </property>
</configuration>
⑥, mapred-site.xml (Analyst MapReduce configuration)

<configuration>
	<!--Tell hadoop to run MapReduce on YARN in the future (framework name used by analysts)-->
        <property>
              <name>mapreduce.framework.name</name>
              <value>yarn</value>
       </property>
	<!--Analyst work history address-->
        <property>
              <name>mapreduce.jobhistory.address</name>
              <value>nonde1:10020</value>
       </property>
	<!--View the analyst's historical work records on the web page-->
        <property>
              <name>mapreduce.jobhistory.webapp.address</name>
              <value>nonde1:19888</value>
       </property>
</configuration>
⑦, yarn-site.xml (configuration of yarn framework)
<configuration>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>    
		<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>           
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>    
	<property>
		<name>yarn.resourcemanager.address</name>
		<value>node1:8032</value>
	</property>
	<property>     
		<name>yarn.resourcemanager.scheduler.address</name>          
		<value>node1:8030</value>
	</property>
	<property>          
		<name>yarn.resourcemanager.resource-tracker.address</name>     
		<value>node1:8035</value>
	</property>
	<property>        
		<name>yarn.resourcemanager.admin.address</name>        
		<value>node1:8033</value>
	</property>
	<property>        
		<name>yarn.resourcemanager.webapp.address</name>        
		<value>node1:8088</value>
	</property>
</configuration>
Step 5. Format the NameNode on the main server (Master) (verify that the Hadoop configuration is correct)
cd ~/hadoop
bin / hdfs purpose -format
Step 6. Start Hadoop
cd ~hadoop (Note: The command must be executed in the hadoop installation directory)
①, first, start HDFS
sbin/start-dfs.sh
②, then, start YARN
sbin/start-yarn.sh
You can also start (or stop) HDFS and YARN at the same time with the following command
sbin/start-all.sh start all
sbin/stop-all.sh stop all
3. Check the cluster status to see if Hadoop is successfully started
bin/hdfs dfsadmin -report













Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325400595&siteId=291194637