Flume is a distributed, reliable, and available service forefficiently collecting, aggregating, and moving large amounts of logdata. It has a simple and flexible architecture based on streamingdata flows. It is robust and fault tolerant with tunable reliabilitymechanisms and many failover and recovery mechanisms. Ituses a simple extensible data model that allows for online analyticapplication.
apache hivedownload 选择binary 编译好的 source源文件需要自己编译
配置一个采集的方案
vim netcat-logger.conf
#Name the components on this agent
#a1 agent name
a1.sources=r1
a1.sinks=k1
a1.channels=c1
#Describe/configure the source
#type netcat从某个端口接受数据
a1.sources.r1.type=netcat
a1.sources.r1.bind=localhost
a1.sources.r1.port=44444
#Describe the sink
a1.sinks.k1.type=logger
#Use a channel which buffers events in memory
#100个事件,capacity最大能放多少,
#transactionCapacity每次转移多少
a1.channels.c1.type=memory
a1.channels.c1.capacity=100
a1.channels.c1.transactionCapacity=100
#bind the sources and sink to the channel
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
注意一个带s一个不带s
a1.sources.r1.channels=c1
a1.sinks.k1.channels=c1
放到flume/conf 下
启动
flume/bin/flume-ng agent --conf conf --conf-file conf/netcat-logger.conf --name a1 -Dflume.root.logger=INFO,console
先加载原来自带的配置文件,再加载自定义的配置,制定agent name需和配置文件中的对应,
采用 telnet 模拟发送数据
telnet localhost 44444
还可以从目录中采集
#Name the components on this agent
#a1 agent name
a1.sources=r1
a1.sinks=k1
a1.channels=c1
#Describe/configure the source
#type netcat从某个端口接受数据
#type核心参数,采集的类型
#spooldir 监视目录文件的变化
a1.sources.r1.type=spooldir
a1.sources.r1.spoolDir=/root/flumeSpool
a1.sources.r1.fileHeader=true
#Describe the sink
a1.sinks.k1.type=logger
#Use a channel which buffers events in memory
#100个事件,capacity最大能放多少,
#transactionCapacity每次转移多少
a1.channels.c1.type=memory
a1.channels.c1.capacity=100
a1.channels.c1.transactionCapacity=100
#bind the sources and sink to the channel
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1