First, prepare the virtual machine environment
1. clone a virtual machine
2. Modify the static IP cloned virtual machine
vim /etc/sysconfig/network-scripts/ifcfg-eth33
3. Modify the host name
View hostname hostname
modify the hostnamevi /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME= hadoop101
Linux modify host mapping file (hosts file) vim /etc/hosts
add
8.8.8.101 hadoop101
8.8.8.102 hadoop102
8.8.8.103 hadoop103
After editing *重启设备*
4. Close the firewall (centos 7)
Check firewall status firewall-cmd --state
stop firewall systemctl stop firewalld.service
prohibit firewall bootsystemctl stop firewalld.service
5. Create user banana
Adding a user useradd banana
to set the user password passwd banana
to see if the user exists id banana
to see which users have created cat /etc/passwd
delete a user deletes a user but to save the user's home directory userdel banana
to delete the user and the user's home directory userdel -r banana
's own user name displaywhoami
6.sudo configure banana root user
Modify the configuration file vi /etc/sudoers
find the following line (line 91), add the following line to the root
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
banana ALL=(ALL) ALL
Or when configured to use sudo command, no password
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
atguigu ALL=(ALL) NOPASSWD:ALL
Modification, can now use banana account login, and then use the command sudo, you can get root privileges to operate ( sudo useradd zhangsan
)
Second, the distribution of cluster write scripts xsync
1.scp (secure copy) copy security [push, pull, third parties]
Definitions: SCP data copy may be implemented between the server and the server
Syntax: scp -r $pdir/$fname $user@hadoop$host:$pdir/$fname
i.e. scp -r banana@hadoop101:/soft/software root@hadoop103:/soft/software
(at 102 copies the file 101 to 103)
Note: Do not forget to copy over the configuration file at the source / etc / profile
Note: copied the / opt / module directory, do not forget to modify all the files in hadoop102 on, hadoop103, owner and owner group.sudo chown banana:banana -R /soft/software
2.rsync remote synchronization tool
用rsync做文件的复制要比scp的速度快,rsync只对差异文件做更新。scp是把所有文件都复制过去
语法:rsync -rvl $pdir/$fname $user@hadoop$host:$pdir/$fname
3.xsync集群分发脚本
定义:循环复制文件到所有节点的相同目录下
脚本实现
1.在/home/banana目录下创建bin目录,并在bin目录下xsync创建文件
2.文件内容
#!/bin/bash
#1 获取输入参数个数,如果没有参数,直接退出
pcount=$#
if((pcount==0)); then
echo no args;
exit;
fi
#2 获取文件名称
p1=$1
fname=`basename $p1`
echo fname=$fname
#3 获取上级目录到绝对路径
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir
#4 获取当前用户名称
user=`whoami`
#5 循环
for((host=103; host<105; host++)); do
echo ------------------- hadoop$host --------------
rsync -rvl $pdir/$fname $user@hadoop$host:$pdir
done
3.修改脚本 xsync 具有执行权限 chmod 777 xsync
4.调用脚本形式 xsync /home/banana/bin
注意:如果将xsync放到/home/atguigu/bin目录下仍然不能实现全局使用,可以将xsync移动到/usr/local/bin目录下。
三、完全分布式运行模式
1)准备3台客户机(关闭防火墙、静态ip、主机名称)
2)安装JDK并配置环境变量
解压 tar -zxvf jdk-8u161-linux-x64.tar.gz -C /soft/software
(/soft/software是自己新建的目录)
配置环境变量 vim /etc/profile
文件尾部加入:
export JAVA_HOME=/soft/software/jdk1.8.0_161/ #安装jdk的路径
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
使配置生效 source /etc/profile
3)安装Hadoop并配置环境变量
解压 tar -zxvf hadoop-2.9.2.tar.gz -C /soft/software
配置环境变量 vim /etc/profile
文件尾部加入:
export HADOOP_HOME=/soft/software/hadoop-2.9.2 #自己安装的Hadoop路径
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
使配置生效 source /etc/profile
4)配置集群
- 核心配置文件
vi core-site.xml
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop101:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/soft/module/hadoop-2.9.2/data/tmp</value>
</property>
- HDFS配置文件
配置hadoop-env.sh vi hadoop-env.sh
export JAVA_HOME=/soft/module/jdk1.8.0_161
配置hdfs-site.xml vi hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 指定Hadoop辅助名称节点主机配置 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop103:50090</value>
</property>
- YARN配置文件
vi yarn-env.sh
export JAVA_HOME=/soft/module/jdk1.8.0_161
vi yarn-site.xml
<!-- Reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop102</value>
</property>
- MapReduce配置文件
配置mapred-env.sh vi mapred-env.sh
export JAVA_HOME=/soft/module/jdk1.8.0_161
配置mapred-site.xml cp mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
vi mapred-site.xml
在该文件中增加如下配置
<!-- 指定MR运行在Yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
- 分发配置好的hadoop配置文件
xsync /opt/module/hadoop-2.9.2/
- 在另外的节点上查看文件分发情况
cat /opt/module/hadoop-2.9.2/etc/hadoop/core-site.xml
5)单点启动
- 如果集群是第一次启动,需要格式化NameNode
hadoop namenode -format
- 在hadoop101上启动NameNode
hadoop-daemon.sh start namenode
- 在hadoop101、hadoop102以及hadoop103上分别启动DataNode
hadoop-daemon.sh start datanode
jps
查看进程
6)配置ssh、群起集群
- 各节点间无密登录
生成公钥和私钥(然后敲(三个回车),就会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥))
ssh-keygen -t rsa
将公钥拷贝到要免密登录的目标机器上
ssh-copy-id hadoop101
ssh-copy-id hadoop102
ssh-copy-id hadoop103
X3(在所有节点上执行)
========================================
配置slaves
vim /soft/module/hadoop-2.9.2/etc/hadoop/slaves
加上
hadoop101
hadoop102
hadoop103
分发文件
xsync.sh /soft/module/hadoop-2.9.2/etc/hadoop/slaves
- 启动集群
- 如果集群是第一次启动,需要格式化NameNode(注意格式化之前,一定要先停止上次启动的所有namenode和datanode进程,然后再删除data和log数据)
hdfs namenode -format
- 启动HDFS 在namenode(hadoop101)节点上执行
sbin/start-dfs.sh
- 启动YARN 在ResouceManager(hadoop102)节点上执行
sbin/start-yarn.sh
- 如果集群是第一次启动,需要格式化NameNode(注意格式化之前,一定要先停止上次启动的所有namenode和datanode进程,然后再删除data和log数据)
- 在网页上查看集群信息
7)测试集群
- 上传文件到集群
- 查看文件存放在什么位置 (HDFS文件存储路径)(HDFS在磁盘存储文件内容)
- 拼接文件
- 下载文件