Hadoop common commands

Network Configuration

  • hostname View pearl name

  • vim /etc/sysconfig/network set hostname

  • ifconfig view ip situation

  • vim /etc/sysconfig/network-scripts/ifcfg-eth0 set network

    • DEVICE="eth0" interface name (device, network card)

    • BOOTPROTO=STATIC IP configuration method (static: fixed IP, dhcp:, none: manual)

    • ONBOOT=yes Whether the network port is valid when the system starts

    • IPADDR=192.168.1.2 IP URL

    • GATEWAY=192.168.1.0 gateway

    • DNS1=8.8.8.8 DNS Server

  • service network restart restart the network card service

  • service network start start network card service

  • service network stop stop the network card service

  • ifconfig eth0 up|down enable and disable the specified network card

  • ifconfig Check whether the configured ip information takes effect

  • vim /etc/hosts set pearl and ip mapping relationship

    • 192.168.1.2 master

    • 192.168.1.3 slave1

    • 192.168.1.4 slave2

  • ping master

  • service iptables stop turn off the firewall

  • chkconfig iptables off turn off the self-starting firewall service

Configure SSH

  • rpm -qa | grep openssh to see if the ssh service is installed

  • rpm -qa | grep rsync to see if the rsync service is installed

  • yum install ssh install ssh protocol

  • yum install rsync rsync is a remote data synchronization tool

  • service sshd restart start the sshd service

  • ssh-keygen -t rsa -p ' ' to generate a passwordless key pair (the storage path is /home/Haddop/.ssh)

  • cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys Append id_rsa.pub to the authorized key

  • chmod 600 ~/.ssh/authorized_keys to grant read and write permissions

  • vim /etc/ssh/sshd_config Modify the configuration file of the sshd service

RSAAuthentication yes # Enable RSA authentication

PubkeyAuthentication yes # Enable public key and private key pairing authentication method

AuthorizedKeysFile .ssh/authorized_keys # Public key file path (same as the file generated above)

  • service sshd restart restarts the sshd service, the modification takes effect

  • ssh master Strictly verify ssh login (the first time you will be asked to enter a password)

Point-to-multipoint SSH passwordless login

  • ssh-keygen

  • ssh-copy-id storm@slave1 format is "ssh-copy-id username@hostname"

  • ssh-copy-id storm@slave2 Copy the public key of the local machine name to the authorized_keys file of the remote machine

Install JDK

  1. root user login

  2. mkdir /usr/java creates the /usr/java directory

  3. cp /root/Downloads/ jdk-6u31-linux-i584.bin /usr/java

  4. chmod +x jdk-6u31-linux-i584.bin gives permission to execute

  5. ./jdk-6u31-linux-i584.bin Execute the decompressed bin file

  6. rm -rf jdk-6u31-linux-i584.bin delete jdk installation file

  7. vim /etc/profile

Add the following at the end:

# set java environment

export JAVA_HOME=/usr/java/jdk1.6.0_31/

export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib

export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

source /etc/profile makes the profile's configuration file take effect

java -version Verify that jdk is installed successfully

Install the remaining machines:

  • scp /usr/java/jdk1.6.0_31/ Hadoop@mster$i:/usr/java;

Install using shell script:

for i in $(seq 1 100);

do echo slave$i;

scp /usr/javajdk1.6.0_31/ Hadoop@slave$i:/usr/java;

done

The configuration file of the profile environment variable must be configured and sent to all clusters at one time

Hadoop cluster installation

  1. Login as root user

  2. cp /root/Downloads/Hadoop-1.0.0.tar.gz /usr

  3. cd /usr

  4. tar -zxvf Hadoop-1.0.0.tar.gz Unzip the installation package of tar.gz

  5. mv Hadoop-1.0.0.tar.gz hadoop folder renamed

  6. chown -R Hadoop: Hadoop Hadoop hadoop file owner reassignment, -R is recursive, the hadoop folder is assigned to the hadoop user under the hadoop group

  7. rm -rf Hadoop-1.0.0.tar.gz delete the installation files

Configure Hadoop environment variables

  1. vim /etc/profile

    1. export HADOOP_HOME=/usr/Hadoop

    2. export PATH=$PATH:$HADOOP_HOME/bin

  2. source /etc/profile to make the configuration take effect

configure hadoop

  1. Configure hadoop-env.sh (file located in /usr/Hadoop/bin/conf)

    1. vim /usr/Hadoop/bin/conf/Hadoop-env.sh

    2. export JAVA_HOME=/usr/java/jdk1.6.0_31

  2. Configure the core-site.xml file

    1. mkdir /usr/Hadoop/tmp creates a folder tmp to save temporary data of hadoop

    2. vim /usr/Hadoop/bin/conf/core-site.xml

<configuration>

   <property>

       <name>hadoop.tmp.dir</name>

       <value>/usr/hadoop/tmp</value>

        ( Note: Please create a tmp folder in the /usr/hadoop directory first. The default system temporary directory is: /tmp/Hadoop-hadoop. This directory will be deleted every time it is restarted. You must re-execute the format, otherwise it will error.)

       <description>A base for other temporary directories.</description>

   </property>

<!--file system properties, configure the access address of the NameNode -->

   <property>

       <name>fs.default.name</name>

        <value>hdfs://192.168.1.2:9000</value>

   </property>

</configuration>

  1. Configure hdfs-site.xml, the default backup method is 3

<configuration>

   <property>

       <name>dfs.replication</name>

        <value>1</value>

        ( Note: replication is the number of data copies, the default is 3, and an error will be reported if the salve is less than 3)

   </property>

<configuration>

  1. Placement mapred-site.xml

Modify the configuration file of mapreduce in hadoop, the address and port of the configured jobTracker

<configuration>

   <property>

       <name>mapred.job.tracker</name>

        <value>http://192.168.1.2:9001</value>

   </property>

</configuration>

  1. configure mster

Modify the /usr/Hadoop/conf/masters file to specify the hostname of the master machine

    vim /usr/Hadoop/conf/masters

        192.168.1.2 (or master)

  1. Configure slave

vim /usr/Hadoop/conf/slaves

       slave1

       slave2

Note: When a single machine is started, conf/slaves must not be empty. Designate yourself with no other machines.

In a cluster environment, slaves can not be configured on the slave machine

  1. Repeat this configuration on other machines in the cluster

It is recommended that ordinary users copy to the corresponding directory of other machines through scp under hadoop, where step 6 is unique to the master machine

Using shell script:

for i in $(seq1 100);

do echo slave$i;

scp /usr/hadoop Hadoop@slave$i:/usr;

scp /etc/profile Hadoop@slave$i:/etc;

done

After copying the files, you may find that the hadoop directory is root privileges

chown -R hadoop:Hadoop Hadoop authorization to hadoop user

Hadoop startup related commands:

  • Hadoop namenode -format Format the namenode on the master machine

You only need to execute it once. If you want to execute it again, you must first delete the configuration in the configuration file core-site.xml

The file under the corresponding path of hadoop.tmp.dir

  • service iptables stop Turn off the firewall of all machines in the cluster

for i in (seq 1 100);

Do ssh node $i "hostname;

service iptables stop;

chkconfig iptables off;

service iptables status";

done

  • start-all.sh starts all services of hadoop, including (related services of hdfs and mapreduce)

It can be seen from the following startup log that the namenode is started first, then datanode1, datanode2, ..., and then the secondarynamenode is started. Start jobtracker again, then start tasktracker1, tasktracker2, ..........

After successfully starting hadoop, the dfs folder is generated in the tmp folder in the master, and the dfs file plus mapred folder is generated in the tmp folder in the slave.

  • jps view process

    • The result on master is

      • jobTracker

      • NameNode

      • jps

      • SecondaryNameNode

    • The result on the slave is

      • TaskTracker

      • DataNode

      • jps

  • Hadoop dfsadmin -report View the status of hadoop cluster

  • Hadoop dfsadmin -safemode leave turn off the safe mode of hdfs

http:192.168.1.2:50030 Visit the corresponding webpage of mapreduce

http:192.168.1.2:50070 Visit the corresponding webpage of hdfs

The ultimate solution for the server has been unable to start:

  1. Delete the /usr/Hadoop/tmp file on all machines in the cluster

  2. Delete the pid files on all machines in the cluster. It is saved in the /tmp directory by default. Authorize hadoop user

  3. Re-execute stop-all.sh, and turn off the services that can be turned off first

  4. Run the ps -ef|grep java|grep hadoop command to check whether there are any hadoop-related processes running. If so, kill it with the kill -9 process number command.

  5. Reformatting the words Zhuji master

  6. Execute start-all.sh to start hadoop

  7. It is found that no error is reported. Execute the Hadoop dfsadmin -report command to check the running status of hadoop, and find that only one node is started. There may still be a safe mode

  8. Execute hadoop dfsadmin -safemode leave to turn off safe mode on the host

  9. Execute hadoop dfsadmin -report again

Solve the "no datanode to stop" problem

reason:

Each time the namenode format will re-create a namenodeId, and /tmp/dfs/data contains the id under the last format, the namenode format clears the data under the namenode, but does not clear the data under the datanode, then the startup fails, Before each format, clear all directories under tmp

  • the first method:

Delete the tmp folder on the master rm -fr /usr/Hadoop/tmp


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324928272&siteId=291194637