37. Installation and testing of Flume

In the previous article, we briefly introduced the Flume framework. This article mainly introduces the installation and testing of Flume. The content of these recent blogs is relatively simple. Pay attention to the column "Break the Cocoon and Become a Butterfly-Big Data" to see more related content~


table of Contents

One, the installation of Flume

1.1 Download Flume

1.2 Upload and unzip

1.3 Modify the configuration file

Two, Flume's test

2.1 Environmental preparation

2.2 Create a configuration file

2.3 Open ports, production data

2.3.1 Open the listening port

2.3.2 Start the port to send data


 

One, the installation of Flume

1.1 Download Flume

First, we need to go to the official website to download the Flume installation package, click here to download ~ we downloaded version 1.7.0.

1.2 Upload and unzip

Upload the downloaded tar package to the specified directory and decompress it:

1、解压
tar -zxvf ./apache-flume-1.7.0-bin.tar.gz -C ../modules/

2、切换到解压缩的目录
cd ../modules/

3、修改一下文件名,目的是为了简单点,可以不修改
mv apache-flume-1.7.0-bin flume

1.3 Modify the configuration file

1. First, you need to switch to the conf directory of flume, copy the flume-env.sh.template file, and rename it to flume-env.sh

2. Add JAVA_HOME to flume-env.sh.

Two, Flume's test

I installed Flume above, let’s take a brief look at an official case: monitoring port data. Start Flume to monitor port 44444 of the machine, and send data to port 44444 of the machine through netcat, and Flume will print the monitored data on the console.

2.1 Environmental preparation

1. First, we need to use the netstat command to see if port 44444 is occupied. The netstat command is a very useful tool for monitoring TCP/IP networks. It can display routing tables, actual network connections, and status information of each network interface device. The main option parameters of this command are as follows:

1、-t或--tcp:显示TCP传输协议的连线状况; 
2、-u或--udp:显示UDP传输协议的连线状况;
3、-n或--numeric:直接使用ip地址,而不通过域名服务器; 
4、-l或--listening:显示监控中的服务器的Socket; 
5、-p或--programs:显示正在使用Socket的程序识别码(PID)和程序名称。

If the port is not occupied, the following content will be displayed:

2. Install the netcat tool

yum install -y nc

2.2 Create a configuration file

Create a flume-netcat-logger.conf file in the conf directory of flume and add the following configuration:

# 声明source、channel、sink。其中,a1表示agent的名称,r1表示a1的输入源,k1表示a1的目的地,c1表示a1的缓冲区。
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
# 输入类型为netcat端口类型
a1.sources.r1.type = netcat
# 监听的主机名
a1.sources.r1.bind = localhost
# 监听的端口号
a1.sources.r1.port = 44444

# 指定sink类型为logger
a1.sinks.k1.type = logger

# 设置channel为内存模式
a1.channels.c1.type = memory
# 设置channel的总容量为1000个event
a1.channels.c1.capacity = 1000
# 设置channel收集到100条event后再提交事务
a1.channels.c1.transactionCapacity = 100

# 连接source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2.3 Open ports, production data

2.3.1 Open the listening port

bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/flume-netcat-logger.conf -Dflume.root.logger=INFO,console

The related parameter descriptions are as follows:

1、--conf conf/:表示配置文件存储在conf/目录,--conf也可以使用-c表示。
2、--name a1:表示给agent起名为a1,--name也可以使用-n代替。
3、--conf-file conf/flume-netcat-logger.conf:flume本次启动读取的配置文件是在conf目录下的flume-netcat-logger.conf文件。
4、-Dflume.root.logger==INFO,console:-D表示flume运行时动态修改flume.root.logger参数属性值,并将控制台日志打印级别设置为INFO级别。日志级别包括:log、info、warn、error。

2.3.2 Start the port to send data

nc localhost 44444

On the listener page, you can see that the data has been received:

 

Well, this article is very simple, just install Flume and test a simple use case by the way. In the next article, let's make a few more complicated examples to test. What problems did you encounter in this process, welcome to leave a message, let me see what problems you encountered~

Guess you like

Origin blog.csdn.net/gdkyxy2013/article/details/111761541