Build a pseudo-distributed Hadoop environment on CentOS7

1. Download the installation package

Download the installation package hadoop

Official website address: https://hadoop.apache.org/releases.html

Version: It is recommended to use hadoop-2.7.3.tar.gz

System Environment: CentOS 7

Note: The need to have the support of JDK, version 1.8 or higher

2. Extract the installation package

  • Installed as the default path / usr / soft, so the first transport path to the installation package to the
cd /usr/soft
tar -zxvf hadoop-2.7.3.tar.gz

3. Environment variable configuration

vi /ect/profile

Wrap the end of the file append the following

export HADOOP_HOME=/usr/soft/hadoop-2.7.3
export HADOOP_MAPRED_HOME=HADOOP_HOME
export HADOOP_COMMON_HOME=HADOOP_HOME
export HADOOP_HDFS_HOME=HADOOP_HOME
export YARN_HOME=HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=HADOOP_HOME/lib/native
export PATH=PATH:HADOOP_HOME/sbin:HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOME

After modifying the configuration, update files

source /etc/profile

4. Pseudo distributed configuration

File directory: /usr/soft/hadoop-2.7.3/etc/hadoop/

Need to modify the file: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml

 

a) core-site.xml

First touch in a directory named tmp folder hadoop file

cd /usr/soft/hadoop-2.7.3
mkdir tmp

Add the following tag into the configuration file:

1) fs.defaultFS = hdfs: //192.168.0.103: 9000 default file system (local default file: / position) where the port is set to the same port HBASE

2)hadoop.tmp.dir=/usr/soft/hadoop-2.7.3/tmp

 

 

b) hdfs.site.xml

dfs.replication = (number of copies, distributed at least three pseudo write only one) 1, the relationship between the host process

 

 

c) mapred-site.xml

Within a directory and not the full name mapred-site.xml file, but it has named: mapred-site.xml.template

Copy the file rename mapred-site.xml;

cd /usr/soft/hadoop-2.7.3/etc/hadoop/
cp mapred-site.xml.template mapred-site.xml

Modify the configuration file: Mapreduce.framewok.name = yarn, provided MapReducing model framework yarn

<property>
<name>mapreduce.framewok.name</name>
<value>yarn</value>
</property>

 

 

 

d) yarn-site.xml

Yarn.resourcemanager.localhost = localhost // yarn domain name

Yarn.nodemanager.aux-service = mapreduce_shuffle // secondary node management

 

 

e) hadoop-env.sh (optional)

Best to change the relative path to absolute path configured jdk

 

File is modified!

 

5. Configure SSH (Secure Sockets Processing)

The purpose is to use the pace to start to start the remote server, you must use the shell landed remote services, but each landing requires a password is very troublesome, all you need to configure non-dense configuration, you need to generate private keys NameNode, the public key to DataNode

a) generating a secret key of

ssh-keygen -t rsa

 

b) a public key secret key to copy database

When pseudo-distributed, copied to your

cd cd ~/.ssh/
cat id_rsa.pub >> authorized_keys

When fully distributed, to copy DataNodes (another computer node)

scp  root@主机名:~/.ssh/id_rsa.pub  ~/.ssh/id_rsa.pub
cat  ~/.ssh/id_rsa.pub  >> ~/.ssh/authorized_keys

 

c) the authorized_key permissions set to 600

chmod 600 ~/.ssh/authorized_keys

 

Note: hadoop want to visit the site in the host in step

Modify the virtual machine / etc / hosts file deletion 127.0.0.1 Information

Add information

本机IP master
本机IP slave
本机IP localhost

 

6. Format NameNode

hdfs namenode -format

If you did not find the command prompt, re-examine the third step is to configure the environment variables

 

7. Start Hadoop

Start command (are stored in sbin folder)

cd /usr/soft/hadoop-2.7.3/sbin/

start-all.sh
或
start-dfs.sh 
start-yarn.sh

 

8 Check the startup state

Browser to access the address, the page appears that is success

本机地址:50070

Guess you like

Origin www.cnblogs.com/whoyoung/p/10988546.html