Use Flume monitor an entire directory of files, and upload it to HDFS.
First, create a profile flume-dir-hdfs.conf
https://flume.apache.org/FlumeUserGuide.html#spooling-directory-source
The ON the this the Name Components # Agent a3.sources = R3 a3.sinks = K3 a3.channels = C3 # the Describe / Configure The Source a3.sources.r3.type = spooldir a3.sources.r3.spoolDir = / tmp / Upload # to add the suffix spoolDir files in a directory, case records and unrecorded (renamed after the first record) a3.sources.r3.fileSuffix = .COMPLETED a3.sources.r3.fileHeader = to true # ignore all files ending in .tmp, not Upload a3.sources.r3.ignorePattern = ([^] * \ tmp.) # the Describe at The sink a3.sinks.k3.type = HDFS a3.sinks.k3.hdfs.path = HDFS: // h136: 9000 / Flume / Upload / m%%% D the Y /% H # upload prefix a3.sinks.k3.hdfs.filePrefix = upload- # according to whether the scrolling time folder a3.sinks.k3.hdfs.round to true = # much time unit create a new folder = 1 a3.sinks.k3.hdfs.roundValue # redefine the unit of time a3.sinks.k3.hdfs.roundUnit hour = # whether to use a local time stamp a3.sinks.k3.hdfs.useLocalTimeStamp = to true # only accumulate the number of Event flush to HDFS once a3.sinks.k3.hdfs.batchSize = 100 # set the file type that supports compression a3.sinks.k3.hdfs.fileType = DataStream # how long to generate a new file a3.sinks.k3.hdfs.rollInterval 60 = scroll # set the size of each file is about 128M a3.sinks.k3.hdfs.rollSize = 134.2177 million has nothing to do with the number of rolling Event # files a3.sinks.k3.hdfs.rollCount = 0 # Which buffers the Use a Channel Events Memory in a3.channels.c3.type = Memory a3.channels.c3.capacity = 1000 a3.channels.c3.transactionCapacity = 100 # Bind at The Source and sink to at The Channel a3.sources.r3.channels = C3 a3.sinks.k3.channel C3 =
Second, start
cd /opt/apache-flume-1.9.0-bin/ bin/flume-ng agent --conf conf/ --name a3 --conf-file /tmp/flume-job/flume-dir-hdfs.conf -Dflume.root.logger=INFO,console
Third, the test
vim /tmp/123.txt 123 456 789 cp /tmp/123.txt /tmp/upload/ cp /tmp/123.txt /tmp/upload/456.txt cp /tmp/123.txt /tmp/upload/789.txt
The recorded file will automatically add the suffix. If you copy a file Flume tmp is not the end of the record, it has been ignored in the configuration.
Note: Do not create when using the Spooling Directory Source directory and continuous monitoring modify the file, upload the completed file will end with .COMPLETED, monitors file folder is scanned once every 500 ms file changes.
Files on HDFS