Flume real case (Flume accepted telent data collection catalog to HDFS, capture files to HDFS, two agent cascade)

A, Flume accepted telent data

The first step : the development of the configuration file

vim   /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf/netcat-logger.conf

# Define this agent in the name of each component

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 描述和配置source组件:r1
a1.sources.r1.type = netcat
a1.sources.r1.bind = 192.168.52.120
a1.sources.r1.port = 44444

# 描述和配置sink组件:k1
a1.sinks.k1.type = logger

# 描述和配置channel组件,此处使用是内存缓存的方式
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 描述和配置source  channel   sink之间的连接关系
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Channel parameters explained:

capacity: The default maximum number of event in the channel can be stored

trasactionCapacity: Maximum can get from each source, or to a number of event sink

Step Two : Start the configuration file

Profile acquisition scheme specified , start flume agent in the corresponding node

First with one of the most simple example program to test whether the normal environment

Start agent to collect data

bin/flume-ng agent -c conf -f conf/netcat-logger.conf -n a1  -Dflume.root.logger=INFO,console

-c conf designated flume own configuration file directory

-f conf / netcat-logger.con designated collection scheme we have described

-n a1 specify the name of our agent

The third step : Install telent prepare for the test

Node02 machine mounted thereon in a telnet client for transmitting analog data

yum -y install telnet

telnet node03 44444 # analog data transmission using telnet

Second, the collection catalog to HDFS

Schematic:

 

Collection needs: a server under a specific directory, it will continue to generate new files whenever new files appear, we need to collect files to HDFS go

According to requirements, first of all define the following three major elements

  1. Data source components, namely source - monitor file directory: spooldir

spooldir features:

   1, monitoring a directory, as long as the new file appears in the directory, it will capture the contents of the file

   2, completed the acquisition documents, will be agent automatically add a suffix: COMPLETED

   3, monitored directory does not allow files with the same names repeated

  1. Sinking components, namely sink - HDFS file system: hdfs sink
  2. Channel assembly, i.e. channel-- File available channel memory channel may also be used

flume profile development

Write configuration file:

cd  /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf

mkdir -p /export/servers/dirfile

I came spooldir.conf

# Name the components on this agent

a1.sources=r1

a1.channels=c1

a1.sinks=k1

# Describe/configure the source

##注意:不能往监控目中重复丢同名文件

a1.sources.r1.type=spooldir

a1.sources.r1.spoolDir=/export/dir

a1.sources.r1.fileHeader = true

# Describe the sink

a1.sinks.k1.type=hdfs

a1.sinks.k1.hdfs.path=hdfs://node01:8020/spooldir/

# Describe the channel

a1.channels.c1.type=memory

a1.channels.c1.capacity=1000

a1.channels.c1.transactionCapacity=100

# Bind the source and sink to the channel

a1.sources.r1.channels=c1

a1.sinks.k1.channel=c1

Start flume

bin/flume-ng agent -c ./conf -f ./conf/spooldir.conf -n a1 -Dflume.root.logger=INFO,console

Upload files to a specified directory

Upload the file to a different directory to the following, note that the file must be unique

cd / export / dir

Third, the capture files to HDFS

Collection requirements: business systems such as the use of log4j log generated by the log content is increasing, we need to append data to the log file in real-time acquisition to hdfs

According to requirements, first of all define the following three major elements

  1. Acquisition source, namely source-- monitors file updates: exec 'tail -F file'
  2. Sinking objectives, namely sink - HDFS file system: hdfs sink
  3. --Channel transfer passage between the Source and sink, File available channel memory channel may also be used

Definition of flume profiles

node03 development profile

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf

vim tail-file.conf

Profile content

a1.sources=r1

a1.channels=c1

a1.sinks=k1

# Describe/configure tail -F source1

a1.sources.r1.type=exec

a1.sources.r1.command =tail -F /export/taillogs/access_log



# Describe sink1

a1.sinks.k1.type=hdfs

a1.sinks.k1.hdfs.path=hdfs://node01:8020/spooldir/



# Use a channel which buffers events in memory

a1.channels.c1.type=memory

a1.channels.c1.capacity=1000

a1.channels.c1.transactionCapacity=100



# Bind the source and sink to the channel

a1.sources.r1.channels=c1

a1.sinks.k1.channel=c1

Start flume

cd  /export/servers/apache-flume-1.6.0-cdh5.14.0-bin

bin/flume-ng agent -c conf -f conf/tail-file.conf -n agent1  -Dflume.root.logger=INFO,console

Development shell script timed additional file contents

mkdir -p /export/shells/

cd  /export/shells/

vim tail-file.sh

 

Create a folder

mkdir -p /export/servers/taillogs

Startup script

sh /export/shells/tail-file.sh

#!/bin/bash

while true

do

 date >> /export/servers/taillogs/access_log;

  sleep 0.5;

done

Fourth, the two cascaded agent

 

The first data file from among the agent responsible for collecting, via the network to transmit them to a second agent, the second agent is responsible for receiving a first data sent from the agent, save the data to go above hdfs

The first step : the Node 02 is installed flume

The flume node03 decompressed file folders copied to the machine above the top to the machine node02

cd  /export/servers

scp -r apache-flume-1.6.0-cdh5.14.0-bin/ node02:$PWD

Step Two : node02 configuration flume configuration file

Configure our flume in node02 machine

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf

tail-avro-avro-logger.conf

##################

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1



# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /export/taillogs/access_log



# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100



##sink端的avro是一个数据发送者

a1.sinks.k1.type = avro

a1.sinks.k1.hostname = 192.168.52.120

a1.sinks.k1.port = 4141



#Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

The third step : node02 development given script file to write data

Node03 directly copy the script and data to the following node02 to, the following command node03 machine

cd  /export

scp -r shells/ taillogs/ node02:$PWD

Step five: node03 development flume profiles

Development flume profiles on node03 machine

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf

avro come-hdfs.conf

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1



##source中的avro组件是一个接收者服务

a1.sources.r1.type = avro

a1.sources.r1.bind = 192.168.52.120

a1.sources.r1.port = 4141



# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100



# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = hdfs://node01:8020/avro



# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

Step Six : Start order

node03 machine starts flume process

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin

bin/flume-ng agent -c conf -f conf/avro-hdfs.conf -n a1  -Dflume.root.logger=INFO,console   

node02 machine starts flume process

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/

bin/flume-ng agent -c conf -f conf/tail-avro-avro-logger.conf -n a1  -Dflume.root.logger=INFO,console    

node02 machine Kai shell script file generation

mkdir /export/taillogs/

cd  /export/servers/shells

sh tail-file.sh

More source and sink assembly

Flume official documents

http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.6.0-cdh5.14.0/FlumeUserGuide.html

Published 81 original articles · won praise 21 · views 2239

Guess you like

Origin blog.csdn.net/qq_44065303/article/details/103440481