Article directory
4. User behavior data collection module
4.3 Log Collection Flume
4.3.4 Flume test for log collection
4.3.4.1 Start Zookeeper and Kafka Cluster
4.3.4.2 Start hadoop102 log collection Flume
[summer@hadoop102 flume-1.9.0]$ bin/flume-ng agent -n a1 -c conf/ -f job/file_to_kafka.conf -Dflume.root.logger=info,console
Start an agent, -n (name) name is a1, -c is to find the following directory, find the conf directory, -f specifies the file_to_kafka.conf under our job directory, -Dflume.root.logger=info, console is logger Print, only print on the console, info level, output to the console console
4.3.4.3 Start a Kafka Console-Consumer
[summer@hadoop102 kafka-3.0.0]$ bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic topic_log
Consume topic_log this topic
4.3.4.4 Generating Simulation Data
[summer@hadoop102 ~]$ lg.sh
========== hadoop102 ==========
========== hadoop103 ==========
[summer@hadoop102 ~]$ cd /opt/module/applog/log/
[summer@hadoop102 log]$ ll
总用量 3352
-rw-rw-r--. 1 summer summer 808665 10月 23 09:27 app.2022-10-23.log
-rw-rw-r--. 1 summer summer 2388678 10月 25 17:45 app.2022-10-25.log
[summer@hadoop102 log]$ echo '{id:1}' >> app.2022-10-25.log
[summer@hadoop102 log]$ echo '{id:' >> app.2022-10-25.log
4.3.4.5 Observe whether Kafka consumers can consume data
pass
[summer@hadoop102 log]$ echo '{id:1}' >> app.2022-10-25.log
[summer@hadoop102 log]$ echo '{id:' >> app.2022-10-25.log
With these two lines of code, we can check that the interceptor we wrote is successful
4.3.5 Flume startup and shutdown script for log collection
4.3.5.1 Distributing log collection Flume configuration files and interceptors
If the above test passes, you need to send a copy of the Flume configuration file and interceptor jar package of the hadoop102 node to another log server. Use the distribution script directly.
4.3.5.2 For convenience, here is a script to start and stop the log collection Flume process
nohup English full name no hang up (no hang up), used to run commands in the background of the system without hanging up, exiting the terminal will not affect the running of the program.
When using ps -ef | grep Application
[summer@hadoop102 log]$ ps -ef | grep Application
There will be an extra grep process, so you can use grep -v grep to filter out the grep process
[summer@hadoop102 log]$ ps -ef | grep Application | grep -v grep
In this way, you can find out the desired process, and then extract the process number, here use awk,
(awk '{[pattern] action}' {filenames} # line matching statement awk '' can only use single quotation marks) to cut
[summer@hadoop102 log]$ ps -ef | grep Application | grep -v grep |awk '{print $2}'
Extract the process we want, and then kill the process
[summer@hadoop102 log]$ kill -9 ps -ef | grep Application | grep -v grep |awk '{print $2}'
The normal sequence of kill commands cannot be used. It will treat the whole after -9 as a string, so we can use xargs -n to reverse the output
to kill the process number 67857, but we still have a problem. If there is Multiple Application processes and multiple Flume jobs will also be killed at the same time. We don't want to do this, so we can improve it. According to the file name, a Flume configuration on a machine will only be started once and will not be started. repeatedly.
[summer@hadoop102 log]$ ps -ef | grep file_to_kafka | grep -v grep | awk '{print $2}'
So you can use this command to query the process number
This achieves the result we want.
But if you write in the script, you have to add a backslash in awk '{print $2}', because there is $1 in the outermost layer, if you write $2, it will conflict with the outer layer, so it is better to add a '' awk ' {print $2}'
Create the script f1.sh in the /home/summer/bin directory of the hadoop102 node.
The complete script code is as follows:
#!/bin/bash
case $1 in
"start"){
for i in hadoop102 hadoop103
do
echo " --------启动 $i 采集flume-------"
ssh $i "nohup /opt/module/flume-1.9.0/bin/flume-ng agent -n a1 -c /opt/module/flume-1.9.0/conf/ -f /opt/module/flume-1.9.0/job/file_to_kafka.conf >/dev/null 2>&1 &"
done
};;
"stop"){
for i in hadoop102 hadoop103
do
echo " --------停止 $i 采集flume-------"
ssh $i "ps -ef | grep file_to_kafka | grep -v grep |awk '{print \$2}' | xargs -n1 kill -9 "
done
};;
esac
[summer@hadoop102 applog]$ cd /home/summer/bin/
[summer@hadoop102 bin]$ vim f1.sh
4.3.5.3 Add script execution permission
[summer@hadoop102 bin]$ chmod 777 f1.sh
4.3.5.4 f1 start
[summer@hadoop102 bin]$ f1.sh start
4.3.5.5 f2 stop
[summer@hadoop102 bin]$ f1.sh stop