CentOS6.8—big data cluster construction
1.1 tool version
Virtual machine software: VMware12
Client software: SecureCRT8.5
Server: CentOS6.8 English, basic service version
软件版本:
hadopp-2.7.2
zookeeper-3.4.10
1.2 Template machine configuration
1.2.1 Turn off SELinux
vim /etc/selinux/config
修改
SELINUX=disabled
1.2.2 Modify the MAC address of the network configuration
vim /etc/sysconfig/network-scripts/ifcfg-eth0
删除下面两行
HWADDR=00:0C:29:13:5D:74
UUID=ae0965e7-22b9-45aa-8ec9-3f0a20a85d11
1.2.3 Turn off the firewall
service iptables stop
1.2.4 Cancel the firewall startup
chkconfig iptables off
1.2.5 Start the crond service
service crond restart
1.2.6 Set the startup
chkconfig crond on
1.2.7 Uninstall jdk
rpm -qa | grep jdk
rpm -e --nodeps java-1.6.0-openjdk-1.6.0.38-1.13.10.4.el6.x86_64 java-1.7.0-openjdk-1.
7.0.99-2.6.5.1.el6.x86_64
1.2.8 Install lrzsz
yum -y install lrzsz
1.2.9 install jdk
mkdir /opt/module
mkdir /opt/software
# 上传jdk1.8
tar -zxvf jdk-8u144-linux-x64.tar.gz -C ../module/
mv /opt/module/jdk1.8.0_144 /opt/module/jdk-1.8
# 配置环境变量
vim /etc/profile
# java-1.8
export JAVA_HOME=/opt/module/jdk-1.8
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
java -version
1.3 Cluster Construction
Clone the three hosts, select link clone, and then start the virtual machine, all three need to modify the ip, configure the network card and mapping
modify hostname
vim /etc/sysconfig/network
Modify IP address
vim /etc/sysconfig/network-scripts/ifcfg-eth0
Delete network card information and wait for regeneration
rm -rf /etc/udev/rules.d/70-persistent-net.rules
Configure ip-host mapping
vim /etc/hosts
192.168.1.6 node-1
192.168.1.7 node-2
192.168.1.8 node-3
Note: It needs to be restarted to take effect, and the three machines execute: reboot
1.3.1 install ssh
#Three machines execute at the same time:
ssh-keygen
ssh-copy-id node-1
Execute on #node-1:
cd /root/.ssh/
scp authorized_keys node-2:$PWD
scp authorized_keys node-3:$PWD
After execution, visit once on all three machines (password is required for the first verification, and will not be needed later)
ssh node-1
ssh node-2
ssh node-3
1.3.2 Clock Synchronization
Node-1 sync Alibaba Cloud
node-2 sync node-1
node-3 sync node-1
Build a clock server
#检查ntp服务是否安装
rpm -qa|grep ntp
#修改ntp配置文件
vim /etc/ntp.conf
#授权192.168.233.0网段上的所有机器可以从这台机器上查询和同步时间
restrict 192.168.233.0 mask 255.255.255.0 nomodify notrap
#集群在局域网中,不使用其他的网络时间
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
#当该节点丢失网络连接,依然可以作为时间服务器为集群中的其他节点提供时间同步
server 127.127.1.0
fudge 127.127.1.0 stratum 10
#修改/etc/sysconfig/ntpd 文件
vim /etc/sysconfig/ntpd
#保证 主板 BIOS 与系统时间同步
SYNC_HWCLOCK=yes
#重启ntpd服务
service ntpd restart
#设置开机启动
chkconfig ntpd on
#定时任务表设置一分钟同步一次
crontab -e
*/1 * * * * /usr/sbin/ntpdate -u ntp.aliyun.com
#不仅仅有阿里云,下面三个任选其一作为node-1的同步机器
ntp.aliyun.com
time.apple.com
ntp.ntsc.ac.cn
#将其它机器(node-2,node-3)的ntp服务停掉
service ntpd stop
#关闭开机启动
chkconfig ntpd off
#设置同步node-1的时间,一分钟同步一次
crontab -e
*/1 * * * * /usr/sbin/ntpdate node-1
1.3.3 Configure yum source
The node-1 configuration gives priority to the local, followed by the network, because the local speed is faster than the network.
Configure local yum source
mkdir -p /mnt/centos
mkdir -p /mnt/local_yum
mount -o loop /dev/cdrom /mnt/centos
cp -r /mnt/centos/* /mnt/local_yum/
# 注意:下面步骤的执行要退出该目录
umount /mnt/centos/
cd /etc/yum.repos.d/
rename .repo .repo.back *.repo
cp CentOS-Media.repo.back local_yum.repo
vim local_yum.repo
[local_yum]
name=this is my local yum
baseurl=file:///mnt/local_yum
gpgcheck=0
enabled=1
priority=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
yum clean all
yum repolist all
Network yum source
yumwget http://mirrors.163.com/.help/CentOS6-Base-163.repo
wget http://mirrors.aliyun.com/repo/Centos-6.repo
# 安装开启优先级管理
yum -y install yum-plugin-priorities.noarch
cat /etc/yum/pluginconf.d/priorities.conf
enabled = 1 #启动状态
#安装gcc测试
yum -y install httpd
#分发到node-2和node-3上
scp local_yum.repo node-2:$PWD
scp local_yum.repo node-3:$PWD
scp Centos-6.repo node-2:$PWD
scp Centos-6.repo node-3:$PWD
Mount the resource Packager on node-1 under Apache Server, and you can access the data through http
service httpd start
chkconfig httpd on
cd /var/www/html/
ln -s /mnt/local_yum/ centos
Node-2 is configured to connect to node-1, node-3 is configured to connect to node-1
vim local_yum.repo
[local_yum]
name=this is my local yum
baseurl=http://node-1/centos/
gpgcheck=0
enabled=1
priority=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
1.4 Hadoop installation
1.4.1. Cluster planning
1.4.2. Configuration file modification
tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/
#对脚本的jdk配置环境变量
vim hadoop-env.sh
export JAVA_HOME=/opt/module/jdk-1.8
vim yarn-env.sh
export JAVA_HOME=/opt/module/jdk-1.8
vim mapred-env.sh
export JAVA_HOME=/opt/module/jdk-1.8
vim core-site.xml
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://node-1:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>
vim hdfs-site.xml
<!--副本数量-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--secondarynamenode的地址,辅助namenode工作 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node-3:50090</value>
</property>
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<!-- 指定mr运行在yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
vim yarn-site.xml
<!-- reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node-2</value>
</property>
#配置从节点
vim slaves
node-1
node-2
node-3
1.4.3. Environment variable modification
vim /etc/profile
# hadoop-2.7.2
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile
1.4.4. Distribution installation package and environment variable configuration file
scp /opt/module/hadoop-2.7.2 node-2:$PWD
scp /opt/module/hadoop-2.7.2 node-3:$PWD
scp /etc/profile node-2:/etc/
scp /etc/profile node-3:/etc/
1.4.5. Formatting a cluster
Format the namenode on node-1 and format it only once, and do not format it later anyway, otherwise an error will occur that the UUID is different
hadoop namenode -format
#注意:在node-2上进行:start-all.sh
node-1:mr-jobhistory-daemon.sh start historyserver
1.4.6. Cluster access
NameNode :
- node-1:50070
Yarn :
- node-2:8088
SeconeraryNameNode :
- node-3:50090
Yarn-historyServer:
- node-1:19888