Big data project e-commerce data warehouse, log collection Flume test, log collection Flume start and stop script

4. User behavior data collection module

4.3 Log Collection Flume

4.3.4 Flume test for log collection

4.3.4.1 Start Zookeeper and Kafka Cluster

insert image description here

4.3.4.2 Start hadoop102 log collection Flume

[summer@hadoop102 flume-1.9.0]$ bin/flume-ng agent -n a1 -c conf/ -f job/file_to_kafka.conf -Dflume.root.logger=info,console

Start an agent, -n (name) name is a1, -c is to find the following directory, find the conf directory, -f specifies the file_to_kafka.conf under our job directory, -Dflume.root.logger=info, console is logger Print, only print on the console, info level, output to the console console

4.3.4.3 Start a Kafka Console-Consumer

[summer@hadoop102 kafka-3.0.0]$ bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic topic_log

Consume topic_log this topic

4.3.4.4 Generating Simulation Data

[summer@hadoop102 ~]$ lg.sh 
========== hadoop102 ==========
========== hadoop103 ==========
[summer@hadoop102 ~]$ cd /opt/module/applog/log/
[summer@hadoop102 log]$ ll
总用量 3352
-rw-rw-r--. 1 summer summer  808665 10月 23 09:27 app.2022-10-23.log
-rw-rw-r--. 1 summer summer 2388678 10月 25 17:45 app.2022-10-25.log
[summer@hadoop102 log]$ echo '{id:1}' >> app.2022-10-25.log 
[summer@hadoop102 log]$ echo '{id:' >> app.2022-10-25.log 

insert image description here

4.3.4.5 Observe whether Kafka consumers can consume data

pass

[summer@hadoop102 log]$ echo '{id:1}' >> app.2022-10-25.log 
[summer@hadoop102 log]$ echo '{id:' >> app.2022-10-25.log 

With these two lines of code, we can check that the interceptor we wrote is successful

insert image description here

insert image description here

4.3.5 Flume startup and shutdown script for log collection

4.3.5.1 Distributing log collection Flume configuration files and interceptors

If the above test passes, you need to send a copy of the Flume configuration file and interceptor jar package of the hadoop102 node to another log server. Use the distribution script directly.

4.3.5.2 For convenience, here is a script to start and stop the log collection Flume process

nohup English full name no hang up (no hang up), used to run commands in the background of the system without hanging up, exiting the terminal will not affect the running of the program.

When using ps -ef | grep Application

[summer@hadoop102 log]$ ps -ef | grep Application

insert image description hereThere will be an extra grep process, so you can use grep -v grep to filter out the grep process

[summer@hadoop102 log]$ ps -ef | grep Application | grep -v grep

insert image description here
In this way, you can find out the desired process, and then extract the process number, here use awk,
(awk '{[pattern] action}' {filenames} # line matching statement awk '' can only use single quotation marks) to cut

[summer@hadoop102 log]$ ps -ef | grep Application | grep -v grep |awk '{print $2}'

insert image description hereExtract the process we want, and then kill the process

[summer@hadoop102 log]$ kill -9 ps -ef | grep Application | grep -v grep |awk '{print $2}'

insert image description here
The normal sequence of kill commands cannot be used. It will treat the whole after -9 as a string, so we can use xargs -n to reverse the output
insert image description here
to kill the process number 67857, but we still have a problem. If there is Multiple Application processes and multiple Flume jobs will also be killed at the same time. We don't want to do this, so we can improve it. According to the file name, a Flume configuration on a machine will only be started once and will not be started. repeatedly.
insert image description here

[summer@hadoop102 log]$ ps -ef | grep file_to_kafka | grep -v grep | awk '{print $2}'

insert image description hereSo you can use this command to query the process number

insert image description here
This achieves the result we want.

But if you write in the script, you have to add a backslash in awk '{print $2}', because there is $1 in the outermost layer, if you write $2, it will conflict with the outer layer, so it is better to add a '' awk ' {print $2}'

Create the script f1.sh in the /home/summer/bin directory of the hadoop102 node.
The complete script code is as follows:

#!/bin/bash

case $1 in
"start"){
    
    
        for i in hadoop102 hadoop103
        do
                echo " --------启动 $i 采集flume-------"
                ssh $i "nohup /opt/module/flume-1.9.0/bin/flume-ng agent -n a1 -c /opt/module/flume-1.9.0/conf/ -f /opt/module/flume-1.9.0/job/file_to_kafka.conf >/dev/null 2>&1 &"
        done
};; 
"stop"){
    
    
        for i in hadoop102 hadoop103
        do
                echo " --------停止 $i 采集flume-------"
                ssh $i "ps -ef | grep file_to_kafka | grep -v grep |awk  '{print \$2}' | xargs -n1 kill -9 "
        done

};;
esac

insert image description here

[summer@hadoop102 applog]$ cd /home/summer/bin/
[summer@hadoop102 bin]$ vim f1.sh

insert image description here

4.3.5.3 Add script execution permission

[summer@hadoop102 bin]$ chmod 777 f1.sh

insert image description here

4.3.5.4 f1 start

[summer@hadoop102 bin]$ f1.sh start

insert image description here

4.3.5.5 f2 stop

[summer@hadoop102 bin]$ f1.sh stop

insert image description here

Guess you like

Origin blog.csdn.net/Redamancy06/article/details/127522551