CentOS6.8---big data cluster construction

CentOS6.8—big data cluster construction

1.1 tool version

Virtual machine software: VMware12

Client software: SecureCRT8.5

Server: CentOS6.8 English, basic service version

软件版本:
hadopp-2.7.2
zookeeper-3.4.10

1.2 Template machine configuration

1.2.1 Turn off SELinux

vim /etc/selinux/config
修改
SELINUX=disabled

1.2.2 Modify the MAC address of the network configuration

vim /etc/sysconfig/network-scripts/ifcfg-eth0
删除下面两行
HWADDR=00:0C:29:13:5D:74
UUID=ae0965e7-22b9-45aa-8ec9-3f0a20a85d11

1.2.3 Turn off the firewall

service iptables stop

1.2.4 Cancel the firewall startup

chkconfig iptables off

1.2.5 Start the crond service

service crond restart

1.2.6 Set the startup

chkconfig crond on

1.2.7 Uninstall jdk

rpm -qa | grep jdk
rpm -e --nodeps java-1.6.0-openjdk-1.6.0.38-1.13.10.4.el6.x86_64 java-1.7.0-openjdk-1.
7.0.99-2.6.5.1.el6.x86_64

1.2.8 Install lrzsz

yum -y install lrzsz

1.2.9 install jdk

mkdir /opt/module
mkdir /opt/software
# 上传jdk1.8
tar -zxvf jdk-8u144-linux-x64.tar.gz -C ../module/
mv /opt/module/jdk1.8.0_144 /opt/module/jdk-1.8
# 配置环境变量
vim /etc/profile
# java-1.8
export JAVA_HOME=/opt/module/jdk-1.8
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
java -version

1.3 Cluster Construction

Clone the three hosts, select link clone, and then start the virtual machine, all three need to modify the ip, configure the network card and mapping

modify hostname

 vim /etc/sysconfig/network 

Modify IP address

vim /etc/sysconfig/network-scripts/ifcfg-eth0 

Delete network card information and wait for regeneration

 rm -rf /etc/udev/rules.d/70-persistent-net.rules

Configure ip-host mapping

vim /etc/hosts 

192.168.1.6 node-1 
192.168.1.7 node-2 
192.168.1.8 node-3 

Note: It needs to be restarted to take effect, and the three machines execute: reboot

1.3.1 install ssh

#Three machines execute at the same time:

ssh-keygen 
ssh-copy-id node-1 

Execute on #node-1:

cd /root/.ssh/ 
scp authorized_keys node-2:$PWD
scp authorized_keys node-3:$PWD

After execution, visit once on all three machines (password is required for the first verification, and will not be needed later)

ssh node-1 
ssh node-2 
ssh node-3

1.3.2 Clock Synchronization

Node-1 sync Alibaba Cloud

node-2 sync node-1

node-3 sync node-1

Build a clock server

#检查ntp服务是否安装
rpm -qa|grep ntp

#修改ntp配置文件
vim /etc/ntp.conf

#授权192.168.233.0网段上的所有机器可以从这台机器上查询和同步时间
restrict 192.168.233.0 mask 255.255.255.0 nomodify notrap

#集群在局域网中,不使用其他的网络时间
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst

#当该节点丢失网络连接,依然可以作为时间服务器为集群中的其他节点提供时间同步
server 127.127.1.0
fudge 127.127.1.0 stratum 10

#修改/etc/sysconfig/ntpd 文件
vim /etc/sysconfig/ntpd

#保证 主板 BIOS 与系统时间同步
SYNC_HWCLOCK=yes

#重启ntpd服务
service ntpd restart

#设置开机启动
chkconfig ntpd on

#定时任务表设置一分钟同步一次
crontab -e
*/1 * * * * /usr/sbin/ntpdate -u ntp.aliyun.com

#不仅仅有阿里云,下面三个任选其一作为node-1的同步机器
ntp.aliyun.com
time.apple.com
ntp.ntsc.ac.cn

#将其它机器(node-2,node-3)的ntp服务停掉
service ntpd stop
#关闭开机启动
chkconfig ntpd off

#设置同步node-1的时间,一分钟同步一次
crontab -e
*/1 * * * * /usr/sbin/ntpdate node-1

1.3.3 Configure yum source

The node-1 configuration gives priority to the local, followed by the network, because the local speed is faster than the network.

Configure local yum source

mkdir -p /mnt/centos
mkdir -p /mnt/local_yum
mount -o loop /dev/cdrom /mnt/centos
cp -r /mnt/centos/* /mnt/local_yum/
# 注意:下面步骤的执行要退出该目录
umount /mnt/centos/
cd /etc/yum.repos.d/
rename .repo .repo.back *.repo
cp CentOS-Media.repo.back local_yum.repo
vim local_yum.repo
[local_yum]
name=this is my local yum
baseurl=file:///mnt/local_yum
gpgcheck=0
enabled=1
priority=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
yum clean all
yum repolist all

Network yum source

yumwget http://mirrors.163.com/.help/CentOS6-Base-163.repo
wget http://mirrors.aliyun.com/repo/Centos-6.repo

# 安装开启优先级管理
yum -y install yum-plugin-priorities.noarch

cat /etc/yum/pluginconf.d/priorities.conf
enabled = 1 #启动状态

#安装gcc测试
yum -y install httpd
#分发到node-2和node-3上

scp local_yum.repo node-2:$PWD
scp local_yum.repo node-3:$PWD

scp Centos-6.repo node-2:$PWD
scp Centos-6.repo node-3:$PWD

Mount the resource Packager on node-1 under Apache Server, and you can access the data through http

service httpd start
chkconfig httpd on
cd /var/www/html/
ln -s /mnt/local_yum/ centos

Node-2 is configured to connect to node-1, node-3 is configured to connect to node-1

vim local_yum.repo
[local_yum]
name=this is my local yum
baseurl=http://node-1/centos/
gpgcheck=0
enabled=1
priority=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

1.4 Hadoop installation

1.4.1. Cluster planning

insert image description here

1.4.2. Configuration file modification

tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/
#对脚本的jdk配置环境变量
vim hadoop-env.sh
export JAVA_HOME=/opt/module/jdk-1.8
vim yarn-env.sh
export JAVA_HOME=/opt/module/jdk-1.8
vim mapred-env.sh
export JAVA_HOME=/opt/module/jdk-1.8
vim core-site.xml
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://node-1:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>
vim hdfs-site.xml
<!--副本数量-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--secondarynamenode的地址,辅助namenode工作 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node-3:50090</value>
</property>
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<!-- 指定mr运行在yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
vim yarn-site.xml
<!-- reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node-2</value>
</property>
#配置从节点
vim slaves
node-1
node-2
node-3

1.4.3. Environment variable modification

vim /etc/profile
# hadoop-2.7.2
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile

1.4.4. Distribution installation package and environment variable configuration file

scp /opt/module/hadoop-2.7.2 node-2:$PWD
scp /opt/module/hadoop-2.7.2 node-3:$PWD

scp /etc/profile node-2:/etc/
scp /etc/profile node-3:/etc/

1.4.5. Formatting a cluster

Format the namenode on node-1 and format it only once, and do not format it later anyway, otherwise an error will occur that the UUID is different

hadoop namenode -format
#注意:在node-2上进行:start-all.sh
node-1:mr-jobhistory-daemon.sh start historyserver

1.4.6. Cluster access

NameNode :

  • node-1:50070

Yarn :

  • node-2:8088

SeconeraryNameNode :

  • node-3:50090

Yarn-historyServer:

  • node-1:19888

Guess you like

Origin blog.csdn.net/qq_40745994/article/details/107172050