A collection of commonly used scripts for big data development
Big data environment related scripts
bash run mode description
The running mode of bash can be divided into
login shell
(user name, password login) andnon-login shell
(SSH login).
Loading status of both login methods:
1. The login shell will be loaded when it starts:/etc/profile,~/.bash_profile,~/.bashrc
2. The non-login shell will be loaded when it starts:~/.bashrc
Note: ~/.bashrc
In fact, it will be loaded /etc/bashrc
and /etc/bashrc
loaded again./etc/profile.d/*.sh
SSH login note:
When SSHing to other nodes,
non-login shell
the mode is used. The default is not to load/etc/profile
the configuration file, which will cause the environment variable information to be unconfigured, and some commands will not be found.
Create shell script directory
Create
/root/shell
a directory to store shell scripts
Configure the shell script directory in
/etc/bashrc
the file and use it as an environment variable so that the newly created shell script can be used anywhere
# My Shell
export PATH=$PATH:/root/shell
Configure hosts
Configure the hosts file of each node and set the mapping between node IP and Name.
vim /etc/hosts
172.29.234.1 node01
172.29.234.2 node02
172.29.234.3 node03
172.29.234.4 node04
172.29.234.5 node05
SSH automatic configuration script
Execute the script to automatically configure password-free login for each node.
vim ssh_config.sh
#! /bin/bash
function sshPasswordLogin() {
# 检测expect服务是否存在,不存在则使用yum安装expect
expectIsExists=$(rpm -qa | grep expect)
if [ -z "$expectIsExists" ]; then
yum -y install expect
fi
# 密钥对不存在则创建密钥
if [ ! -f /root/.ssh/id_rsa.pub ]; then
ssh-keygen -t rsa -P "" -f /root/.ssh/id_rsa
fi
# 服务器列表
# servers=("IP地址1 用户名1 密码1" "IP地址2 用户名2 密码2" "IP地址3 用户名3 密码3")
servers=("node01 root 123456" "node02 root 123456" "node03 root 123456" "node04 root 123456" "node05 root 123456")
for server in "${servers[@]}"; do
hostname=$(echo "$server" | cut -d " " -f1)
username=$(echo "$server" | cut -d " " -f2)
password=$(echo "$server" | cut -d " " -f3)
echo "Configuring password login on $hostname..."
expect <<EOF
spawn ssh-copy-id "$username@$hostname"
expect {
"yes/no" {
send "yes\n"
exp_continue
}
"password" {
send "$password\n"
exp_continue
}
eof
}
EOF
done
}
sshPasswordLogin
Change execution permissions
chmod +x ssh_config.sh
Execute the ssh_config.sh script on each node, and then automatically configure SSH passwords to log in to each other.
[root@node01 ~]# ./ssh_config.sh
[root@node02 ~]# ./ssh_config.sh
[root@node03 ~]# ./ssh_config.sh
[root@node04 ~]# ./ssh_config.sh
[root@node05 ~]# ./ssh_config.sh
File synchronization and copy tool rsync
rsync is a powerful file synchronization and replication tool that can perform file transfer and backup between local or remote servers.
Install by running the following command
# CentOS/RHEL
yum install rsync
# Ubuntu/Debian
apt-get install rsync
Basic usage
1. Local file copy:
Copy the files in the source directory to the destination directory
rsync /path/to/source/file /path/to/destination/
2. Copy local directory:
Use the -a parameter to recursively copy the directory, and the -v parameter to display the detailed copy process.
rsync -av /path/to/source/directory/ /path/to/destination/directory/
3. Local file synchronization:
Use the --delete parameter to keep the source and target directories in sync and delete files in the target directory that do not exist in the source directory.
rsync -av --delete /path/to/source/directory/ /path/to/destination/directory/
4. Remote file copy:
Copy local files to remote server via SSH connection. The -z parameter indicates using compression to speed up transfers
rsync -avz -e "ssh" /path/to/local/file user@remote:/path/to/destination/
5. Remote directory copy:
rsync -avz -e "ssh" /path/to/local/directory/ user@remote:/path/to/destination/directory/
File synchronization steps
Transfer the specified file to each specified host node. Before transferring, check whether the file exists, then create the corresponding directory and use rsync for transfer.
This script will loop and copy files in the directory specified on the current node to the same path on other nodes.
vim sync.sh
#! /bin/bash
# 检查是否提供了足够的命令行参数
if [ $# -lt 1 ]; then
echo Not Enough Arguement!
exit
fi
# 遍历集群所有机器
for host in node01 node02 node03 node04 node05; do
echo ==================== $host ====================
# 遍历所有目录,挨个发送
for file in $@; do
# 检查文件是否存在
if [ -e $file ]; then
# 获取父目录
pdir=$(
cd -P $(dirname $file)
pwd
)
# 获取当前文件的名称
fname=$(basename $file)
# 在远程主机执行创建目录的命令
ssh $host "mkdir -p $pdir"
# 将文件传输到远程主机的相应目录
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
done
Change execution permissions
chmod +x sync.sh
Use file synchronization scripts to distribute and synchronize hosts configuration information
[root@node01 ~]# sync.sh /etc/hosts
Command execution script
Loop through a list of server names and run the specified command on each server
#! /bin/bash
for i in node01 node02 node03 node04 node05
do
echo --------- $i ----------
ssh $i "$*"
done
Change execution permissions
chmod +x call.sh
Usage example
call.sh jps
Use the command to execute the script and execute the specified command on each node
[root@node01 ~]# call.sh jps
Node loop simplification
Define a hosts file
node01
node02
node03
node04
node05
Take the simplified command execution script as an example:
#!/bin/bash
for host in `cat /root/hosts` ;
do
# tput命令,用于设置终端输出的文本颜色为绿色
tput setaf 2
echo ======== $host ========
# 将终端输出的文本颜色重置为默认颜色
tput setaf 7
ssh $host "$@"
done
Big data component related scripts
Hadoop cluster script
vim hadoop.sh
#!/bin/bash
# Hadoop安装目录
HADOOP_HOME="/usr/local/program/hadoop"
# namenode分配节点
NAMENODE="node01"
COMMAND=""
if [ $# -lt 1 ]; then
echo "请输入命令参数 start 或 stop"
exit
fi
case $1 in
"start")
echo "=================== 启动 Hadoop 集群 ==================="
echo "--------------- 启动 HDFS ---------------"
ssh $NAMENODE "$HADOOP_HOME/sbin/start-dfs.sh"
echo "--------------- 启动 YARN ---------------"
ssh $NAMENODE "$HADOOP_HOME/sbin/start-yarn.sh"
;;
"stop")
echo "=================== 关闭 Hadoop 集群 ==================="
echo "--------------- 关闭 YARN ---------------"
ssh $NAMENODE "$HADOOP_HOME/sbin/stop-yarn.sh"
echo "--------------- 关闭 HDFS ---------------"
ssh $NAMENODE "$HADOOP_HOME/sbin/stop-dfs.sh"
;;
*)
echo "无效参数: $1"
echo "请输入: start 或 stop"
exit 1
;;
esac
Start and stop Hadoop
hadoop.sh start
hadoop.sh stop
Zookeeper cluster script
vim zk.sh
#!/bin/bash
case $1 in
"start")
for i in node01 node02 node03; do
echo "----------------------zookeeper $i 启动----------------------"
ssh $i "/usr/local/program/zookeeper/bin/zkServer.sh start"
done
;;
"stop")
for i in node01 node02 node03; do
echo "----------------------zookeeper $i 停止----------------------"
ssh $i "/usr/local/program/zookeeper/bin/zkServer.sh stop"
done
;;
"status")
for i in node01 node02 node03; do
echo "----------------------zookeeper $i 状态----------------------"
ssh $i "/usr/local/program/zookeeper/bin/zkServer.sh status"
done
;;
*)
echo "无效的命令"
;;
esac
Modify script execution permissions
chmod +x zk.sh
Start and stop Zookeeper
zk.sh start
zk.sh stop
Kafaka cluster script
vim kafaka.sh
#!/bin/bash
if [ $# -eq 0 ]; then
echo "请输入命令参数 start 或 stop"
exit 1
fi
KAFKA_HOME="/usr/local/program/kafka"
case $1 in
"start")
for node in "node01" "node02" "node03"; do
echo "----------------------kafka $node 启动----------------------"
ssh $node "$KAFKA_HOME/bin/kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties"
# 通过$?获取上一个命令的执行状态。如果执行状态不为 0,则表示启动或停止失败
if [ $? -ne 0 ]; then
echo "启动 $node 失败"
fi
done
;;
"stop")
for node in "node01" "node02" "node03"; do
echo "----------------------kafka $node 停止----------------------"
ssh $node "$KAFKA_HOME/bin/kafka-server-stop.sh"
if [ $? -ne 0 ]; then
echo "停止 $node 失败"
fi
done
;;
*)
echo "无效参数: $1"
echo "请输入: start 或 stop"
exit 1
;;
esac
Modify script execution permissions
chmod +x kafaka.sh
Start and stop Kafaka
kafaka.sh start
kafaka.sh stop
Flume cluster script
Create vim flume.sh
cluster start and stop scripts
#!/bin/bash
# flume执行节点
REMOTE_HOST="node01"
# flume-ng位置
FLUME_EXECUTABLE="/usr/local/program/flume/bin/flume-ng"
# flume配置目录
FLUME_CONF_DIR="/usr/local/program/flume/conf/"
# flume配置文件
FLUME_CONF_FILE="/usr/local/program/flume/job/file_to_kafka.conf"
# 执行进程名称
PROCESS_NAME="file_to_kafka"
case $1 in
"start")
echo " ---------------启动flume采集--------------"
ssh "$REMOTE_HOST" "nohup $FLUME_EXECUTABLE agent -n a1 -c \"$FLUME_CONF_DIR\" -f \"$FLUME_CONF_FILE\" >/dev/null 2>&1 &"
;;
"stop")
echo " ---------------停止flume采集--------------"
ssh "$REMOTE_HOST" "ps -ef | grep $PROCESS_NAME | grep -v grep |awk '{print \$2}' | xargs -n1 kill -9 "
;;
*)
echo "无效参数: $1"
echo "请输入: start 或 stop"
exit 1
;;
esac
Modify script execution permissions
chmod +x flume.sh
Start and stop Flume
flume.sh start
flume.sh stop