搭建环境:(在高可用Hadoop集群上搭建flume1.7.0)
系统:16.04
master vm01
slave1 vm02
slave2 vm03
flume搭建在vm01上
下载Flume1.7.0
cd /tools
tar -xzvf apache-flume-1.7.0-bin.tar.gz
mv apache-flume-1.7.0-bin flume1.7.0
# chmod 777 -R /tools/flume1.7.0 #给目录授权
配置环境变量
#FLUME_HOME
export FLUME_HOME=/home/hadoop/app/flume
export PATH=$PATH:$FLUME_HOME/bin
配置文件:
cd app/flume/conf
cp flume-conf.properties.template flume-conf.properties
vim flume-conf.properties
flume-conf.properties配置如下:
# a.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/hadoop/data/flume
a1.sources.r1.fileHeader = true
a1.sources.r1.deserializer.outputCharset=UTF-8
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://vm01:9000/log
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.maxOpenFiles = 1
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 1000000
a1.sinks.k1.hdfs.batchSize = 100000
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 100000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
创建目录并授权
mkdir /data/flume
chmod 777 -R /data/flume
注:hdfs的log目录,不用手动去创建,它会自动生成的
运行
扫描二维码关注公众号,回复:
2506808 查看本文章
# cd /app/flume/
# bin/flume-ng agent --conf conf --conf-file conf/flume-conf.properties --name a1 -Dflume.root.logger=INFO,console
命令执行后,它会存在一个进程,一直在监听目录,所以会一直开着!
在/data/flume目录下新建txt文件,会在对应的hdfs的flume目录下自动生成文件,并把txt中的数据复制过去。当再传入一个文件时,Flume会将txt中的内容添加到已创建的文件后面!如果关闭这个flume agent,重新开启一个,那么HDFS的log中,会重新生成一个文件进行收集!