A, Flume accepted telent data
The first step : the development of the configuration file
vim /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf/netcat-logger.conf
# Define this agent in the name of each component
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# 描述和配置source组件:r1
a1.sources.r1.type = netcat
a1.sources.r1.bind = 192.168.52.120
a1.sources.r1.port = 44444
# 描述和配置sink组件:k1
a1.sinks.k1.type = logger
# 描述和配置channel组件,此处使用是内存缓存的方式
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 描述和配置source channel sink之间的连接关系
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
Channel parameters explained:
capacity: The default maximum number of event in the channel can be stored
trasactionCapacity: Maximum can get from each source, or to a number of event sink
Step Two : Start the configuration file
Profile acquisition scheme specified , start flume agent in the corresponding node
First with one of the most simple example program to test whether the normal environment
Start agent to collect data
bin/flume-ng agent -c conf -f conf/netcat-logger.conf -n a1 -Dflume.root.logger=INFO,console
-c conf designated flume own configuration file directory
-f conf / netcat-logger.con designated collection scheme we have described
-n a1 specify the name of our agent
The third step : Install telent prepare for the test
Node02 machine mounted thereon in a telnet client for transmitting analog data
yum -y install telnet
telnet node03 44444 # analog data transmission using telnet
Second, the collection catalog to HDFS
Schematic:
Collection needs: a server under a specific directory, it will continue to generate new files whenever new files appear, we need to collect files to HDFS go
According to requirements, first of all define the following three major elements
- Data source components, namely source - monitor file directory: spooldir
spooldir features:
1, monitoring a directory, as long as the new file appears in the directory, it will capture the contents of the file
2, completed the acquisition documents, will be agent automatically add a suffix: COMPLETED
3, monitored directory does not allow files with the same names repeated
- Sinking components, namely sink - HDFS file system: hdfs sink
- Channel assembly, i.e. channel-- File available channel memory channel may also be used
flume profile development
Write configuration file:
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
mkdir -p /export/servers/dirfile
I came spooldir.conf
# Name the components on this agent
a1.sources=r1
a1.channels=c1
a1.sinks=k1
# Describe/configure the source
##注意:不能往监控目中重复丢同名文件
a1.sources.r1.type=spooldir
a1.sources.r1.spoolDir=/export/dir
a1.sources.r1.fileHeader = true
# Describe the sink
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://node01:8020/spooldir/
# Describe the channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
# Bind the source and sink to the channel
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
Start flume
bin/flume-ng agent -c ./conf -f ./conf/spooldir.conf -n a1 -Dflume.root.logger=INFO,console
Upload files to a specified directory
Upload the file to a different directory to the following, note that the file must be unique
cd / export / dir
Third, the capture files to HDFS
Collection requirements: business systems such as the use of log4j log generated by the log content is increasing, we need to append data to the log file in real-time acquisition to hdfs
According to requirements, first of all define the following three major elements
- Acquisition source, namely source-- monitors file updates: exec 'tail -F file'
- Sinking objectives, namely sink - HDFS file system: hdfs sink
- --Channel transfer passage between the Source and sink, File available channel memory channel may also be used
Definition of flume profiles
node03 development profile
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
vim tail-file.conf
Profile content
a1.sources=r1
a1.channels=c1
a1.sinks=k1
# Describe/configure tail -F source1
a1.sources.r1.type=exec
a1.sources.r1.command =tail -F /export/taillogs/access_log
# Describe sink1
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://node01:8020/spooldir/
# Use a channel which buffers events in memory
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
# Bind the source and sink to the channel
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
Start flume
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -c conf -f conf/tail-file.conf -n agent1 -Dflume.root.logger=INFO,console
Development shell script timed additional file contents
mkdir -p /export/shells/
cd /export/shells/
vim tail-file.sh
Create a folder
mkdir -p /export/servers/taillogs
Startup script
sh /export/shells/tail-file.sh
#!/bin/bash
while true
do
date >> /export/servers/taillogs/access_log;
sleep 0.5;
done
Fourth, the two cascaded agent
The first data file from among the agent responsible for collecting, via the network to transmit them to a second agent, the second agent is responsible for receiving a first data sent from the agent, save the data to go above hdfs
The first step : the Node 02 is installed flume
The flume node03 decompressed file folders copied to the machine above the top to the machine node02
cd /export/servers
scp -r apache-flume-1.6.0-cdh5.14.0-bin/ node02:$PWD
Step Two : node02 configuration flume configuration file
Configure our flume in node02 machine
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
tail-avro-avro-logger.conf
##################
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /export/taillogs/access_log
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
##sink端的avro是一个数据发送者
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.52.120
a1.sinks.k1.port = 4141
#Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
The third step : node02 development given script file to write data
Node03 directly copy the script and data to the following node02 to, the following command node03 machine
cd /export
scp -r shells/ taillogs/ node02:$PWD
Step five: node03 development flume profiles
Development flume profiles on node03 machine
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
avro come-hdfs.conf
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
##source中的avro组件是一个接收者服务
a1.sources.r1.type = avro
a1.sources.r1.bind = 192.168.52.120
a1.sources.r1.port = 4141
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://node01:8020/avro
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
Step Six : Start order
node03 machine starts flume process
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -c conf -f conf/avro-hdfs.conf -n a1 -Dflume.root.logger=INFO,console
node02 machine starts flume process
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/
bin/flume-ng agent -c conf -f conf/tail-avro-avro-logger.conf -n a1 -Dflume.root.logger=INFO,console
node02 machine Kai shell script file generation
mkdir /export/taillogs/
cd /export/servers/shells
sh tail-file.sh
More source and sink assembly
Flume official documents
http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.6.0-cdh5.14.0/FlumeUserGuide.html