Several new files in the directory monitor Directory Source Flume-Spooling

Use Flume monitor an entire directory of files, and upload it to HDFS.

 

First, create a profile flume-dir-hdfs.conf

https://flume.apache.org/FlumeUserGuide.html#spooling-directory-source

The ON the this the Name Components # Agent 
a3.sources = R3 
a3.sinks = K3 
a3.channels = C3 

# the Describe / Configure The Source 
a3.sources.r3.type = spooldir 
a3.sources.r3.spoolDir = / tmp / Upload 
# to add the suffix spoolDir files in a directory, case records and unrecorded (renamed after the first record) 
a3.sources.r3.fileSuffix = .COMPLETED 
a3.sources.r3.fileHeader = to true 
# ignore all files ending in .tmp, not Upload 
a3.sources.r3.ignorePattern = ([^] * \ tmp.) 

# the Describe at The sink 
a3.sinks.k3.type = HDFS 
a3.sinks.k3.hdfs.path = HDFS: // h136: 9000 / Flume / Upload / m%%% D the Y /% H 
# upload prefix 
a3.sinks.k3.hdfs.filePrefix = upload- 
# according to whether the scrolling time folder 
a3.sinks.k3.hdfs.round to true = 
# much time unit create a new folder
= 1 a3.sinks.k3.hdfs.roundValue 
# redefine the unit of time 
a3.sinks.k3.hdfs.roundUnit hour = 
# whether to use a local time stamp 
a3.sinks.k3.hdfs.useLocalTimeStamp = to true 
# only accumulate the number of Event flush to HDFS once 
a3.sinks.k3.hdfs.batchSize = 100 
# set the file type that supports compression 
a3.sinks.k3.hdfs.fileType = DataStream 
# how long to generate a new file 
a3.sinks.k3.hdfs.rollInterval 60 = 
scroll # set the size of each file is about 128M 
a3.sinks.k3.hdfs.rollSize = 134.2177 million 
has nothing to do with the number of rolling Event # files 
a3.sinks.k3.hdfs.rollCount = 0 

# Which buffers the Use a Channel Events Memory in 
a3.channels.c3.type = Memory 
a3.channels.c3.capacity = 1000 
a3.channels.c3.transactionCapacity = 100 

# Bind at The Source and sink to at The Channel 
a3.sources.r3.channels = C3 
a3.sinks.k3.channel C3 =

 

Second, start

cd /opt/apache-flume-1.9.0-bin/
bin/flume-ng agent --conf conf/ --name a3 --conf-file /tmp/flume-job/flume-dir-hdfs.conf -Dflume.root.logger=INFO,console

 

Third, the test

vim /tmp/123.txt

123
456
789

cp /tmp/123.txt /tmp/upload/
cp /tmp/123.txt /tmp/upload/456.txt
cp /tmp/123.txt /tmp/upload/789.txt

The recorded file will automatically add the suffix. If you copy a file Flume tmp is not the end of the record, it has been ignored in the configuration.

Note: Do not create when using the Spooling Directory Source directory and continuous monitoring modify the file, upload the completed file will end with .COMPLETED, monitors file folder is scanned once every 500 ms file changes.

Files on HDFS

Guess you like

Origin www.cnblogs.com/jhxxb/p/11557970.html