flume文件、目录的采集

spooldir 目录
exec 文件

目录采集
监视一个目录,只要目录中出现新文件就会采集文件内容
采集完成的文件会被agent自动添加一个后缀 : .COMPLETED

#定义三大组件的名称
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1

配置 source 组件

agent1.sources.source1.type = spooldir 定义source的类型
agent1.sources.source1.spoolDir = /home/hadoop/flumelogs/ 监控的目录
agent1.sources.source1.fileHeader = false

配置 sink 组件

agent1.sinks.sink1.type = hdfs sink类型
agent1.sinks.sink1.hdfs.path=hdfs://myha01/flume_log/%y-%m-%d/%H-%M sink上传路径
agent1.sinks.sink1.hdfs.filePrefix = events hdfs文件前缀
agent1.sinks.sink1.hdfs.maxOpenFiles = 5000
agent1.sinks.sink1.hdfs.batchSize= 100 一次发送多少文件
agent1.sinks.sink1.hdfs.fileType = DataStream 文件类型
agent1.sinks.sink1.hdfs.writeFormat =Text 写的格式
agent1.sinks.sink1.hdfs.rollSize = 102400 当文件达到102400大小时才会产生新的文件
agent1.sinks.sink1.hdfs.rollCount = 1000000 当event达到多大时会产生新的文件
agent1.sinks.sink1.hdfs.rollInterval = 60 间隔多久产生新文件
agent1.sinks.sink1.hdfs.useLocalTimeStamp = true 是否使用本地时间

agent1.channels.channel1.type = memory
agent1.channels.channel1.keep-alive = 120 event 添加到通道中或者移出的允许时间
agent1.channels.channel1.capacity = 500000 最大的event数目
agent1.channels.channel1.transactionCapacity = 600 允许事务的最大event数目

Bind the source and sink to the channel

agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1

文件采集

agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1

Describe/configure tail -F source1

agent1.sources.source1.type = exec
agent1.sources.source1.command = tail -F /home/hadoop/flumelogs/catalina.out
agent1.sources.source1.channels = channel1

Describe sink1

agent1.sinks.sink1.type = hdfs
#a1.sinks.k1.channel = c1
agent1.sinks.sink1.hdfs.path =hdfs://myha01/weblog/flume-event/%y-%m-%d/%H-%M
agent1.sinks.sink1.hdfs.filePrefix = tomcat_
agent1.sinks.sink1.hdfs.maxOpenFiles = 5000
agent1.sinks.sink1.hdfs.batchSize= 100
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.hdfs.writeFormat =Text
agent1.sinks.sink1.hdfs.rollSize = 102400
agent1.sinks.sink1.hdfs.rollCount = 1000000
agent1.sinks.sink1.hdfs.rollInterval = 60
agent1.sinks.sink1.hdfs.round = true
agent1.sinks.sink1.hdfs.roundValue = 10
agent1.sinks.sink1.hdfs.roundUnit = minute
agent1.sinks.sink1.hdfs.useLocalTimeStamp = true

Use a channel which buffers events in memory

agent1.channels.channel1.type = memory
agent1.channels.channel1.keep-alive = 120
agent1.channels.channel1.capacity = 500000
agent1.channels.channel1.transactionCapacity = 600

Bind the source and sink to the channel

agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1

猜你喜欢

转载自blog.csdn.net/weixin_44701192/article/details/90896122