Flume 环境搭建

搭建环境:(在高可用Hadoop集群上搭建flume1.7.0)
系统:16.04
master  vm01
slave1  vm02
slave2  vm03
flume搭建在vm01上

下载Flume1.7.0

cd /tools
tar -xzvf apache-flume-1.7.0-bin.tar.gz
mv apache-flume-1.7.0-bin flume1.7.0

# chmod 777 -R /tools/flume1.7.0 #给目录授权

配置环境变量

#FLUME_HOME  
export FLUME_HOME=/home/hadoop/app/flume  
export PATH=$PATH:$FLUME_HOME/bin  

配置文件:

cd app/flume/conf
cp flume-conf.properties.template flume-conf.properties
vim flume-conf.properties

flume-conf.properties配置如下:

# a.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/hadoop/data/flume
a1.sources.r1.fileHeader = true
a1.sources.r1.deserializer.outputCharset=UTF-8
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://vm01:9000/log
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.maxOpenFiles = 1
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 1000000
a1.sinks.k1.hdfs.batchSize = 100000
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 100000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

创建目录并授权

mkdir /data/flume
chmod 777 -R /data/flume

注:hdfs的log目录,不用手动去创建,它会自动生成的

运行

扫描二维码关注公众号,回复: 2506808 查看本文章
 # cd /app/flume/
 # bin/flume-ng agent --conf conf --conf-file conf/flume-conf.properties --name a1 -Dflume.root.logger=INFO,console

命令执行后,它会存在一个进程,一直在监听目录,所以会一直开着!
在/data/flume目录下新建txt文件,会在对应的hdfs的flume目录下自动生成文件,并把txt中的数据复制过去。当再传入一个文件时,Flume会将txt中的内容添加到已创建的文件后面!如果关闭这个flume agent,重新开启一个,那么HDFS的log中,会重新生成一个文件进行收集!

猜你喜欢

转载自blog.csdn.net/yangyang_yangqi/article/details/79935599