flume实践

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u012865381/article/details/89678813

flume是Java写的,需要依赖jvm,首先安装jre或jdk

统一使用cdh5.7.0版本 apache-flume-1.6.0-cdh5.7.0-bin,到cdh5下载

加入环境变量

export FLUME_HOME=/home/hadoop/app/flume 

export PATH=$FLUME_HOME/bin:$PATH 

source下让其配置生效

拷贝$FLUME_HOME/conf/flume-env.sh.template一份,命名为$FLUME_HOME/conf/flume-env.sh

修改$FLUME_HOME/conf/flume-env.sh的配置:

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home

检测:

xiejundongdeMacBook-Pro:~ xiejundong$ flume-ng version
Flume 1.6.0-cdh5.7.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 8f5f5143ae30802fe79f9ab96f893e6c54a105d1
Compiled by jenkins on Wed Mar 23 11:38:48 PDT 2016
From source with checksum 50b533f0ffc32db9246405ac4431872e

使用Flume的关键就是写配置文件

A) 配置Source
B) 配置Channel
C) 配置Sink
D) 把以上三个组件串起来

简单的例子(http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.6.0-cdh5.7.0/FlumeUserGuide.html

扫描二维码关注公众号,回复: 6061215 查看本文章

监听本地44444端口,接收到的输出到控制台

# example.conf: A single-node Flume configuration

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

其中含义

a1: agent名称
r1: source的名称 k1: sink的名称
c1: channel的名称

启动flume

./bin/flume-ng agent --name a1 --conf ./conf/ --conf-file ./conf/example.conf -Dflume.root.logger=INFO,console

可用Telnet测试

再来一个测试

Agent选型:exec source + memory channel + logger sink

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /Users/xiejundong/data.log
a1.sources.r1.shell = /bin/sh -c

# Describe the sink 
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory 
a1.channels.c1.type = memory

# Bind the source and sink to the channel 
a1.sources.r1.channels = c1 
a1.sinks.k1.channel = c1

启动

flume-ng agent --name a1 --conf ./conf --conf-file ./conf/exec_memory_logger.conf -Dflume.root.logger=INFO,consol

这里的--name 中a1是配置文件中的a1,不能随便取

测试

xiejundongdeMacBook-Pro:~ xiejundong$ echo “hello flume hahahahahaahah” >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo “hello flume hahahahahaahah” >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo “hello flume hahahahahaahah” >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo “hello flume hahahahahaahah” >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo “hello flume hahahahahaahah” >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo “hello flume hahahahahaahah” >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo “hello flume hahahahahaahah” >> data.log

看到输出

2019-04-29 17:38:11,652 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: E2 80 9C 68 65 6C 6C 6F 20 66 6C 75 6D 65 20 68 ...hello flume h }
2019-04-29 17:38:11,652 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: E2 80 9C 68 65 6C 6C 6F 20 66 6C 75 6D 65 20 68 ...hello flume h }
2019-04-29 17:38:11,652 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: E2 80 9C 68 65 6C 6C 6F 20 66 6C 75 6D 65 20 68 ...hello flume h }
2019-04-29 17:38:11,652 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: E2 80 9C 68 65 6C 6C 6F 20 66 6C 75 6D 65 20 68 ...hello flume h }
2019-04-29 17:38:11,653 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: E2 80 9C 68 65 6C 6C 6F 20 66 6C 75 6D 65 20 68 ...hello flume h }
2019-04-29 17:38:11,653 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: E2 80 9C 68 65 6C 6C 6F 20 66 6C 75 6D 65 20 68 ...hello flume h }

再来个例子

Two agents communicating over Avro RPC

两个flume 串起来,跨节点传输数据 一般使用avro sink

左边agent采用exec source + memory channel + avro sink

右边agent采用avro source + memory channel + logger sink

左边机器配置文件exec-memory-avro.conf

#agent name命名为exec-memory-avro,指定source channel sink名称
exec-memory-avro.sources = exec-source 
exec-memory-avro.sinks = avro-sink 
exec-memory-avro.channels = memory-channel

#配置source
exec-memory-avro.sources.exec-source.type = exec 
exec-memory-avro.sources.exec-source.command = tail -F /Users/xiejundong/data.log exec-memory-avro.sources.exec-source.shell = /bin/sh -c

#配置sink
exec-memory-avro.sinks.avro-sink.type = avro 
exec-memory-avro.sinks.avro-sink.hostname = 127.0.0.1
exec-memory-avro.sinks.avro-sink.port = 44444

#配置channel
exec-memory-avro.channels.memory-channel.type = memory

#把sink和source接到channel上
exec-memory-avro.sources.exec-source.channels = memory-channel 
exec-memory-avro.sinks.avro-sink.channel = memory-channel

右边机器配置文件avro-memory-logger.conf

avro-memory-logger.sources = avro-source 
avro-memory-logger.sinks = logger-sink 
avro-memory-logger.channels = memory-channel

avro-memory-logger.sources.avro-source.type = avro
avro-memory-logger.sources.avro-source.bind = 127.0.0.1
avro-memory-logger.sources.avro-source.port = 44444

avro-memory-logger.sinks.logger-sink.type = logger 

avro-memory-logger.channels.memory-channel.type = memory

avro-memory-logger.sources.avro-source.channels = memory-channel 
avro-memory-logger.sinks.logger-sink.channel = memory-channel

启动

先启动右边的机器,再启动左边的

./bin/flume-ng agent --name avro-memory-logger --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/avro-memory-logger.conf -Dflume.root.logger=INFO,console

再启动左边机器

flume-ng agent --name exec-memory-avro --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/exec-memory-avro.conf -Dflume.root.logger=INFO,console

测试

xiejundongdeMacBook-Pro:~ xiejundong$ echo "hahaha" >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo "hahaha" >> data.log
xiejundongdeMacBook-Pro:~ xiejundong$ echo "hahaha" >> data.log

结果

2019-04-29 17:59:42,632 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 68 61 68 61 68 61                               hahaha }
2019-04-29 17:59:42,632 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 68 61 68 61 68 61                               hahaha }
2019-04-29 17:59:42,632 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 68 61 68 61 68 61                               hahaha }

猜你喜欢

转载自blog.csdn.net/u012865381/article/details/89678813
今日推荐