Network Configuration
hostname View pearl name
vim /etc/sysconfig/network set hostname
ifconfig view ip situation
vim /etc/sysconfig/network-scripts/ifcfg-eth0 set network
DEVICE="eth0" interface name (device, network card)
BOOTPROTO=STATIC IP configuration method (static: fixed IP, dhcp:, none: manual)
ONBOOT=yes Whether the network port is valid when the system starts
IPADDR=192.168.1.2 IP URL
GATEWAY=192.168.1.0 gateway
DNS1=8.8.8.8 DNS Server
service network restart restart the network card service
service network start start network card service
service network stop stop the network card service
ifconfig eth0 up|down enable and disable the specified network card
ifconfig Check whether the configured ip information takes effect
vim /etc/hosts set pearl and ip mapping relationship
192.168.1.2 master
192.168.1.3 slave1
192.168.1.4 slave2
ping master
service iptables stop turn off the firewall
chkconfig iptables off turn off the self-starting firewall service
Configure SSH
rpm -qa | grep openssh to see if the ssh service is installed
rpm -qa | grep rsync to see if the rsync service is installed
yum install ssh install ssh protocol
yum install rsync rsync is a remote data synchronization tool
service sshd restart start the sshd service
ssh-keygen -t rsa -p ' ' to generate a passwordless key pair (the storage path is /home/Haddop/.ssh)
cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys Append id_rsa.pub to the authorized key
chmod 600 ~/.ssh/authorized_keys to grant read and write permissions
vim /etc/ssh/sshd_config Modify the configuration file of the sshd service
RSAAuthentication yes # Enable RSA authentication
PubkeyAuthentication yes # Enable public key and private key pairing authentication method
AuthorizedKeysFile .ssh/authorized_keys # Public key file path (same as the file generated above)
service sshd restart restarts the sshd service, the modification takes effect
ssh master Strictly verify ssh login (the first time you will be asked to enter a password)
Point-to-multipoint SSH passwordless login
ssh-keygen
ssh-copy-id storm@slave1 format is "ssh-copy-id username@hostname"
ssh-copy-id storm@slave2 Copy the public key of the local machine name to the authorized_keys file of the remote machine
Install JDK
root user login
mkdir /usr/java creates the /usr/java directory
cp /root/Downloads/ jdk-6u31-linux-i584.bin /usr/java
chmod +x jdk-6u31-linux-i584.bin gives permission to execute
./jdk-6u31-linux-i584.bin Execute the decompressed bin file
rm -rf jdk-6u31-linux-i584.bin delete jdk installation file
vim /etc/profile
Add the following at the end:
# set java environment
export JAVA_HOME=/usr/java/jdk1.6.0_31/
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
source /etc/profile makes the profile's configuration file take effect
java -version Verify that jdk is installed successfully
Install the remaining machines:
scp /usr/java/jdk1.6.0_31/ Hadoop@mster$i:/usr/java;
Install using shell script:
for i in $(seq 1 100);
do echo slave$i;
scp /usr/javajdk1.6.0_31/ Hadoop@slave$i:/usr/java;
done
The configuration file of the profile environment variable must be configured and sent to all clusters at one time
Hadoop cluster installation
Login as root user
cp /root/Downloads/Hadoop-1.0.0.tar.gz /usr
cd /usr
tar -zxvf Hadoop-1.0.0.tar.gz Unzip the installation package of tar.gz
mv Hadoop-1.0.0.tar.gz hadoop folder renamed
chown -R Hadoop: Hadoop Hadoop hadoop file owner reassignment, -R is recursive, the hadoop folder is assigned to the hadoop user under the hadoop group
rm -rf Hadoop-1.0.0.tar.gz delete the installation files
Configure Hadoop environment variables
vim /etc/profile
export HADOOP_HOME=/usr/Hadoop
export PATH=$PATH:$HADOOP_HOME/bin
source /etc/profile to make the configuration take effect
configure hadoop
Configure hadoop-env.sh (file located in /usr/Hadoop/bin/conf)
vim /usr/Hadoop/bin/conf/Hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.6.0_31
Configure the core-site.xml file
mkdir /usr/Hadoop/tmp creates a folder tmp to save temporary data of hadoop
vim /usr/Hadoop/bin/conf/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop/tmp</value>
( Note: Please create a tmp folder in the /usr/hadoop directory first. The default system temporary directory is: /tmp/Hadoop-hadoop. This directory will be deleted every time it is restarted. You must re-execute the format, otherwise it will error.)
<description>A base for other temporary directories.</description>
</property>
<!--file system properties, configure the access address of the NameNode -->
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.2:9000</value>
</property>
</configuration>
Configure hdfs-site.xml, the default backup method is 3
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
( Note: replication is the number of data copies, the default is 3, and an error will be reported if the salve is less than 3)
</property>
<configuration>
Placement mapred-site.xml
Modify the configuration file of mapreduce in hadoop, the address and port of the configured jobTracker
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>http://192.168.1.2:9001</value>
</property>
</configuration>
configure mster
Modify the /usr/Hadoop/conf/masters file to specify the hostname of the master machine
vim /usr/Hadoop/conf/masters
192.168.1.2 (or master)
Configure slave
vim /usr/Hadoop/conf/slaves
slave1
slave2
Note: When a single machine is started, conf/slaves must not be empty. Designate yourself with no other machines.
In a cluster environment, slaves can not be configured on the slave machine
Repeat this configuration on other machines in the cluster
It is recommended that ordinary users copy to the corresponding directory of other machines through scp under hadoop, where step 6 is unique to the master machine
Using shell script:
for i in $(seq1 100);
do echo slave$i;
scp /usr/hadoop Hadoop@slave$i:/usr;
scp /etc/profile Hadoop@slave$i:/etc;
done
After copying the files, you may find that the hadoop directory is root privileges
chown -R hadoop:Hadoop Hadoop authorization to hadoop user
Hadoop startup related commands:
Hadoop namenode -format Format the namenode on the master machine
You only need to execute it once. If you want to execute it again, you must first delete the configuration in the configuration file core-site.xml
The file under the corresponding path of hadoop.tmp.dir
service iptables stop Turn off the firewall of all machines in the cluster
for i in (seq 1 100);
Do ssh node $i "hostname;
service iptables stop;
chkconfig iptables off;
service iptables status";
done
start-all.sh starts all services of hadoop, including (related services of hdfs and mapreduce)
It can be seen from the following startup log that the namenode is started first, then datanode1, datanode2, ..., and then the secondarynamenode is started. Start jobtracker again, then start tasktracker1, tasktracker2, ..........
After successfully starting hadoop, the dfs folder is generated in the tmp folder in the master, and the dfs file plus mapred folder is generated in the tmp folder in the slave.
jps view process
The result on master is
jobTracker
NameNode
jps
SecondaryNameNode
The result on the slave is
TaskTracker
DataNode
jps
Hadoop dfsadmin -report View the status of hadoop cluster
Hadoop dfsadmin -safemode leave turn off the safe mode of hdfs
http:192.168.1.2:50030 Visit the corresponding webpage of mapreduce
http:192.168.1.2:50070 Visit the corresponding webpage of hdfs
The ultimate solution for the server has been unable to start:
Delete the /usr/Hadoop/tmp file on all machines in the cluster
Delete the pid files on all machines in the cluster. It is saved in the /tmp directory by default. Authorize hadoop user
Re-execute stop-all.sh, and turn off the services that can be turned off first
Run the ps -ef|grep java|grep hadoop command to check whether there are any hadoop-related processes running. If so, kill it with the kill -9 process number command.
Reformatting the words Zhuji master
Execute start-all.sh to start hadoop
It is found that no error is reported. Execute the Hadoop dfsadmin -report command to check the running status of hadoop, and find that only one node is started. There may still be a safe mode
Execute hadoop dfsadmin -safemode leave to turn off safe mode on the host
Execute hadoop dfsadmin -report again
Solve the "no datanode to stop" problem
reason:
Each time the namenode format will re-create a namenodeId, and /tmp/dfs/data contains the id under the last format, the namenode format clears the data under the namenode, but does not clear the data under the datanode, then the startup fails, Before each format, clear all directories under tmp
the first method:
Delete the tmp folder on the master rm -fr /usr/Hadoop/tmp