flume是Java写的,需要依赖jvm,首先安装jre或jdk
统一使用cdh5.7.0版本 apache-flume-1.6.0-cdh5.7.0-bin,到cdh5下载
加入环境变量
export FLUME_HOME=/home/hadoop/app/flume
export PATH=$FLUME_HOME/bin:$PATH
source下让其配置生效
拷贝$FLUME_HOME/conf/flume-env.sh.template一份,命名为$FLUME_HOME/conf/flume-env.sh
修改$FLUME_HOME/conf/flume-env.sh的配置:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home
检测:
xiejundongdeMacBook-Pro:~ xiejundong$ flume-ng version
Flume 1.6.0-cdh5.7.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 8f5f5143ae30802fe79f9ab96f893e6c54a105d1
Compiled by jenkins on Wed Mar 23 11:38:48 PDT 2016
From source with checksum 50b533f0ffc32db9246405ac4431872e
使用Flume的关键就是写配置文件
A) 配置Source
B) 配置Channel
C) 配置Sink
D) 把以上三个组件串起来
简单的例子(http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.6.0-cdh5.7.0/FlumeUserGuide.html)
监听本地44444端口,接收到的输出到控制台
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
其中含义
a1: agent名称
r1: source的名称 k1: sink的名称
c1: channel的名称
启动flume
./bin/flume-ng agent --name a1 --conf ./conf/ --conf-file ./conf/example.conf -Dflume.root.logger=INFO,console
可用Telnet测试
再来一个测试
Agent选型:exec source + memory channel + logger sink
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /Users/xiejundong/data.log
a1.sources.r1.shell = /bin/sh -c
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动
flume-ng agent --name a1 --conf ./conf --conf-file ./conf/exec_memory_logger.conf -Dflume.root.logger=INFO,consol
这里的--name 中a1是配置文件中的a1,不能随便取
测试
xiejundongdeMacBook-Pro:~ xiejundong$ echo “hello flume hahahahahaahah” >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo “hello flume hahahahahaahah” >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo “hello flume hahahahahaahah” >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo “hello flume hahahahahaahah” >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo “hello flume hahahahahaahah” >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo “hello flume hahahahahaahah” >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo “hello flume hahahahahaahah” >> data.log
看到输出
2019-04-29 17:38:11,652 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: E2 80 9C 68 65 6C 6C 6F 20 66 6C 75 6D 65 20 68 ...hello flume h }
2019-04-29 17:38:11,652 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: E2 80 9C 68 65 6C 6C 6F 20 66 6C 75 6D 65 20 68 ...hello flume h }
2019-04-29 17:38:11,652 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: E2 80 9C 68 65 6C 6C 6F 20 66 6C 75 6D 65 20 68 ...hello flume h }
2019-04-29 17:38:11,652 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: E2 80 9C 68 65 6C 6C 6F 20 66 6C 75 6D 65 20 68 ...hello flume h }
2019-04-29 17:38:11,653 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: E2 80 9C 68 65 6C 6C 6F 20 66 6C 75 6D 65 20 68 ...hello flume h }
2019-04-29 17:38:11,653 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: E2 80 9C 68 65 6C 6C 6F 20 66 6C 75 6D 65 20 68 ...hello flume h }
再来个例子
两个flume 串起来,跨节点传输数据 一般使用avro sink
左边agent采用exec source + memory channel + avro sink
右边agent采用avro source + memory channel + logger sink
左边机器配置文件exec-memory-avro.conf
#agent name命名为exec-memory-avro,指定source channel sink名称
exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel
#配置source
exec-memory-avro.sources.exec-source.type = exec
exec-memory-avro.sources.exec-source.command = tail -F /Users/xiejundong/data.log exec-memory-avro.sources.exec-source.shell = /bin/sh -c
#配置sink
exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.hostname = 127.0.0.1
exec-memory-avro.sinks.avro-sink.port = 44444
#配置channel
exec-memory-avro.channels.memory-channel.type = memory
#把sink和source接到channel上
exec-memory-avro.sources.exec-source.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel
右边机器配置文件avro-memory-logger.conf
avro-memory-logger.sources = avro-source
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels = memory-channel
avro-memory-logger.sources.avro-source.type = avro
avro-memory-logger.sources.avro-source.bind = 127.0.0.1
avro-memory-logger.sources.avro-source.port = 44444
avro-memory-logger.sinks.logger-sink.type = logger
avro-memory-logger.channels.memory-channel.type = memory
avro-memory-logger.sources.avro-source.channels = memory-channel
avro-memory-logger.sinks.logger-sink.channel = memory-channel
启动
先启动右边的机器,再启动左边的
./bin/flume-ng agent --name avro-memory-logger --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/avro-memory-logger.conf -Dflume.root.logger=INFO,console
再启动左边机器
flume-ng agent --name exec-memory-avro --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/exec-memory-avro.conf -Dflume.root.logger=INFO,console
测试
xiejundongdeMacBook-Pro:~ xiejundong$ echo "hahaha" >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo "hahaha" >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo "hahaha" >> data.log
结果
2019-04-29 17:59:42,632 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 68 61 68 61 68 61 hahaha }
2019-04-29 17:59:42,632 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 68 61 68 61 68 61 hahaha }
2019-04-29 17:59:42,632 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 68 61 68 61 68 61 hahaha }