Hadoop installation is actually very simple. The key is not to step on some pits. Hadoop is actually a java program. It is very simple to run. The pre-configuration and preparation work can be successfully installed in one step.
preparation 1
Install a linux virtual machine, pay attention not to forget the network settings when installing, otherwise you cannot connect to the virtual machine through the LAN
preparation 2
Linux must first create a user to run hadoop and give permissions
[root@ ~]# useradd -m hadoop -s /bin/bash
[root@ ~]# passwd hadoop
Change the password for user hadoop.
new password:
Re-enter the new password:
passwd: All authentication tokens have been successfully updated.
Grant permissions to hadoop user
Modify the /etc/sudoers file, find the following line, and remove the previous comment (#)
## Allows people in group wheel to run all commands
%wheel ALL=(ALL) ALL
Then modify the user to belong to the root group (wheel), the command is as follows:
#usermod -g root hadoop
After the modification is completed, you can now log in with the hadoop account, and then use the command sudo to obtain root privileges to operate.
Preparation 3
[root@ ~]# su hadoop
[hadoop@ root]$
[hadoop@ root]$ ssh-keygen -t rsa -P ''
#The key is stored in the /home/hadoop/.ssh/ directory by default
[hadoop@ root]$ ~$ cat ./.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@ root]$ chmod 0600 !$
chmod 0600 ~/.ssh/authorized_keys
try
[hadoop@ root]$ ssh localhost
Last login: Sat Mar 25 21:04:52 2017
[hadoop@ ~]$
You can log in without a password
Preparation 4:
install jdk
[hadoop@ ~]$ cat .bash_profile
export JAVA_HOME=/usr/java/jdk1.7.0_79
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
Ready to work
I downloaded 2.7.3
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
decompress
tar -zxvf hadoop-2.7.3.tar.gz
Hadoop configuration
First enter /home/hadoop/hadoop-2.7.3/etc
core-site.xml: includes HDFS, MapReduce I/O, and the url (protocol, host name, port) of the namenode node and other core configurations. After the datanode is registered on the namenode, it interacts with the client through this url
vi hadoop-2.7.3/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml: HDFS daemon configuration, including namenode, secondary namenode, datanode
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml: MapReduce daemon configuration, including jobtracker and tasktrackers
vi mapred-site.xml (can be copied from template)
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Global resource management configuration (not very understanding for the time being)
http://www.cnblogs.com/gw811/p/4077318.html
vi yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_suffle</value>
</property>
</configuration>
Configure variables related to the hadoop runtime environment
hadoop@hive:~$ vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_79
There is one more important step to complete these:
The nameNode is formatted and started. If the hostname is modified, the /etc/hosts file also needs to add local resolution, otherwise the initialization will report an error namenode unknown
/hadoop-2.7.3/bin/hdfs purpose -format
Ready to start the command is:
Go to /hadoop-2.7.3/sbin/
./start-all.sh
You can enter the log directory to see the log
/home/hadoop/hadoop-2.7.3/logs
[root@ sbin]# jps
14505 SecondaryNameNode
14305 NameNode
12108 -- process information unavailable
14644 ResourceManager
14390 DataNode
14736 NodeManager
14769 Jps
[root@ sbin]#
Hadoop pseudo-cluster installation is complete