1. Download the installation package
Download the installation package hadoop
Official website address: https://hadoop.apache.org/releases.html
Version: It is recommended to use hadoop-2.7.3.tar.gz
System Environment: CentOS 7
Note: The need to have the support of JDK, version 1.8 or higher
2. Extract the installation package
- Installed as the default path / usr / soft, so the first transport path to the installation package to the
cd /usr/soft
tar -zxvf hadoop-2.7.3.tar.gz
3. Environment variable configuration
vi /ect/profile
Wrap the end of the file append the following
export HADOOP_HOME=/usr/soft/hadoop-2.7.3
export HADOOP_MAPRED_HOME=HADOOP_HOME
export HADOOP_COMMON_HOME=HADOOP_HOME
export HADOOP_HDFS_HOME=HADOOP_HOME
export YARN_HOME=HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=HADOOP_HOME/lib/native
export PATH=PATH:HADOOP_HOME/sbin:HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOME
After modifying the configuration, update files
source /etc/profile
4. Pseudo distributed configuration
File directory: /usr/soft/hadoop-2.7.3/etc/hadoop/
Need to modify the file: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml
a) core-site.xml
First touch in a directory named tmp folder hadoop file
cd /usr/soft/hadoop-2.7.3
mkdir tmp
Add the following tag into the configuration file:
1) fs.defaultFS = hdfs: //192.168.0.103: 9000 default file system (local default file: / position) where the port is set to the same port HBASE
2)hadoop.tmp.dir=/usr/soft/hadoop-2.7.3/tmp
b) hdfs.site.xml
dfs.replication = (number of copies, distributed at least three pseudo write only one) 1, the relationship between the host process
c) mapred-site.xml
Within a directory and not the full name mapred-site.xml file, but it has named: mapred-site.xml.template
Copy the file rename mapred-site.xml;
cd /usr/soft/hadoop-2.7.3/etc/hadoop/
cp mapred-site.xml.template mapred-site.xml
Modify the configuration file: Mapreduce.framewok.name = yarn, provided MapReducing model framework yarn
<property>
<name>mapreduce.framewok.name</name>
<value>yarn</value>
</property>
d) yarn-site.xml
Yarn.resourcemanager.localhost = localhost // yarn domain name
Yarn.nodemanager.aux-service = mapreduce_shuffle // secondary node management
e) hadoop-env.sh (optional)
Best to change the relative path to absolute path configured jdk
File is modified!
5. Configure SSH (Secure Sockets Processing)
The purpose is to use the pace to start to start the remote server, you must use the shell landed remote services, but each landing requires a password is very troublesome, all you need to configure non-dense configuration, you need to generate private keys NameNode, the public key to DataNode
a) generating a secret key of
ssh-keygen -t rsa
b) a public key secret key to copy database
When pseudo-distributed, copied to your
cd cd ~/.ssh/
cat id_rsa.pub >> authorized_keys
When fully distributed, to copy DataNodes (another computer node)
scp root@主机名:~/.ssh/id_rsa.pub ~/.ssh/id_rsa.pub
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
c) the authorized_key permissions set to 600
chmod 600 ~/.ssh/authorized_keys
Note: hadoop want to visit the site in the host in step
Modify the virtual machine / etc / hosts file deletion 127.0.0.1 Information
Add information
本机IP master
本机IP slave
本机IP localhost
6. Format NameNode
hdfs namenode -format
If you did not find the command prompt, re-examine the third step is to configure the environment variables
7. Start Hadoop
Start command (are stored in sbin folder)
cd /usr/soft/hadoop-2.7.3/sbin/
start-all.sh
或
start-dfs.sh
start-yarn.sh
8 Check the startup state
Browser to access the address, the page appears that is success
本机地址:50070