The current big data development faster and faster, higher and higher technical requirements, job demand is also increasing, as the basis for Hadoop big data technology, will build Hadoop big data platform for practitioners become a basic ability, the following introduces two virtual machines to build hadoop cluster.
(A) Based on wmware Centos7 create virtual machine and install the operating system, as a large data cluster master node.
(1) modify the network card configuration files, different versions of the file name linux network card (ifcfg-ens33) different from the card to find files based on the actual situation changes.
#vi /etc/sysconfig/network-scripts/ifcfg-ens33
BOOTPROTO = static # modifications
ONBOOT = yes # Modify
IPADDR=192.168.126.128
NETMASK=255.255.255.0
GATEWAY=192.168.126.2
DNS1=114.114.114.114
DNS2=8.8.8.8
(2) Restart Network
#systemctl restart network
(3) View IP address
#ip a
crt terminal connection tool.
(4) Installation and configuration JDK
4.1 Open the file transfer tool secureFX, the JDK compressed package jdk-8u231-linux-x64.tar.gz uploaded to / opt / lower.
4.2 Creating a directory / usr / jdk64
#mkdir /usr/jdk64
JDK 4.3 decompressing the compressed jdk-8u231-linux-x64.tar.gz to / usr / jdk64
#tar -zxvf /opt/jdk-8u231-linux-x64.tar.gz -C /usr/jdk64
4.4 configuration environment variable, add content in the final surface
#vi /etc/profile
export JAVA_HOME=/usr/jdk64/jdk1.8.0_231
export PATH=$JAVA_HOME/bin:$PATH
4.5 reload / etc / profile
#source /etc/profile
Verify java version 4.6
#java -version
(5) Installation and configuration hadoop (2.8.5 version)
5.1 hadoop-2.8.5.tar.gz uploaded to / opt / directory
5.2 decompression hadoop-2.8.5.tar.gz to / usr / local /
#tar -zxvf /opt/hadoop-2.8.5.tar.gz -C /usr/local/
5.3 hadoop modify configuration files.
1) core-site.xml add the following in <configuration> inside
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/log/hadoop/tmp</value>
</property>
2) hadoop-env.sh, the contents of the line following Notes # remove, and modify the = following values.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_231
3) yarn-env.sh the line following the comment content # remove, and modify the = following values.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_231
4) mapred-site.xml add the following in <configuration> inside
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<Description> Set resource scheduling </ description>
</property>
<property>
<name>mapreduce.jobhistroy.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistroy.webapp.address</name>
<value>master:19888</value>
</property>
5) yarn-site.xml add the following in <configuration> inside
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
6) slaves the localhost delete, add the following
master
slave1
7) hdfs-site.xml add the following in <configuration> inside
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
(6) off the virtual machine wmware. Cloned a new virtual machine.
(7) to start a new cloned VM, modify NIC configuration file, and restart the network (operation in wmware).
#vi /etc/sysconfig/network-scripts/ifcfg-ens33
Modify the last character UUID, inconsistencies can be changed.
IPADDR=192.168.126.129
#systemctl restart network
(8) to start the original virtual machine, both machines reconnect terminal crt.
master (128) node:
(9) modify the hostname
# hostnamectl set-hostname master
# bash
(10) modify the configuration file / etc / hosts, add the following two lines
# vi /etc/hosts
192.168.126.128 master master.centos.com
192.168.126.129 slave1 slave1.centos.com
(11) through a network copy command scp the / etc / hosts / slave1 node to replicate. yes 000000
#scp /etc/hosts root@slave1:/etc/
(12) configured ssh login without keys, Enter three times
#ssh-keygen -t rsa
(13) Copy the corresponding secret key to the host. Yes 000000
#ssh-copy-id -i /root/.ssh/id_rsa.pub master
#ssh-copy-id -i /root/.ssh/id_rsa.pub slave1
(14) installed ntpd service
#yum install ntp -y
(15) configured ntp, comment out the original server, add new content in the following two lines
#vi /etc/ntp.conf
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
server 127.127.1.0
fudge 127.127.1.0 stratum 10
(16) to start the service and add ntp random start, turn off the firewall
#systemctl start ntpd
#systemctl enable ntpd
#systemctl stop firewalld
slave1 (129) node:
(17) modify the hostname
#hostnamectl set-hostname slave1
#bash
Whether the (18) check the / etc / hosts file contents inside has been modified
#cat /etc/hosts
192.168.126.128 master master.centos.com
192.168.126.129 slave1 slave1.centos.com
(19) Installation services ntpdate
#yum install ntpdate -y
Time (20) the synchronization master node, if an error occurs, the master firewall off
#ntpdate master
(21) added to the random start ntpdate.
#systemctl enable ntpdate
master (128) node:
(22) configure the environment variables / etc / profile, add the JAVA_HOME HADOOP_HOME below, and modify the PATH
export JAVA_HOME=/usr/jdk64/jdk1.8.0_231
export HADOOP_HOME=/usr/local/hadoop-2.8.5
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
(23) reload / etc / profile
#source /etc/profile
(24) formatted file system HDFS
#hdfs namenode -format
(25) start
#sh /usr/local/hadoop-2.8.5/sbin/start-all.sh
(26) start checking the results
Open your browser and enter
If you can not open the case, check the firewall is running.
#systemctl status firewalld
#systemctl stop firewalld