Flume 总结
官网:http://flume.apache.org/FlumeUserGuide.html
概述
Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统。
Flume可以采集文件,socket数据包等各种形式源数据,又可以将采集到的数据输出到HDFS、hbase、hive、kafka等众多外部存储系统中
一般的采集需求,通过对flume的简单配置即可实现
Flume针对特殊场景也具备良好的自定义扩展能力,因此,flume可以适用于大部分的日常数据采集场景
运行机制
1、Flume分布式系统中最核心的角色是agent,flume采集系统就是由一个个agent所连接起来形成
2、每一个agent相当于一个数据传递员,内部有三个组件:
a)Source:采集源,用于跟数据源对接,以获取数据
b)Sink:下沉地,采集数据的传送目的,用于往下一级agent传递数据或者往最终存储系统传递数据
c)Channel:angent内部的数据传输通道,用于从source将数据传递到sink
注:
Source 到 Channel 到 Sink之间传递数据的形式是Event事件;Event事件是一个数据流单元。
Source + sink + channel = Agent
单个Agent采集
多个Agent的实例
多路(Multiplexing)Agent
架构设计要点
Flume的架构主要有一下几个核心概念:
Event:一个数据单元,带有一个可选的消息头
Flow:Event从源点到达目的点的迁移的抽象
Client:操作位于源点处的Event,将其发送到Flume Agent
Agent:一个独立的Flume进程,包含组件Source、Channel、Sink
Source:用来消费传递到该组件的Event
Channel:中转Event的一个临时存储,保存有Source组件传递过来的Event
Sink:从Channel中读取并移除Event,将Event传递到Flow Pipeline中的下一个Agent(如果有的话)
Flume安装
安装包下载:
链接:http://pan.baidu.com/s/1miycTfU 密码:2ea4 如果无法下载请联系作者。
1-1)、安装
[root@hadoop1 /]# tar -zxvf apache-flume-1.6.0-bin.tar.gz
[root@hadoop1 /]# mv apache-flume-1.6.0-bin flume
[root@hadoop1 /]# cd /flume/conf/
[root@hadoop1 /]# mv flume-env.sh.template flume-env.sh
1-2)、修改配置文件
[root@hadoop1 /]# vi flume-env.sh
---- 修改JAVA_HOME的路径
1-3)、添加快捷方式
[root@hadoop1 /]# Vi /etc/profile
export FLUME_HOME=/usr/local/flume
[root@hadoop1 ~]# source /etc/profile
1-4)、常见的命令
[root@hadoop1 flume]# flume-ng -help
Error: Unknown or unspecified command '-help'
Usage: /usr/local/flume/bin/flume-ng <command> [options]...
commands:
help display this help text
agent run a Flume agent
avro-client run an avro Flume client
version show Flume version info
global options:
--conf,-c <conf> use configs in <conf> directory
--classpath,-C <cp> append to the classpath
--dryrun,-d do not actually start Flume, just print the command
--plugins-path <dirs> colon-separated list of plugins.d directories. See the
plugins.d section in the user guide for more details.
Default: $FLUME_HOME/plugins.d
-Dproperty=value sets a Java system property value
-Xproperty=value sets a Java -X option
agent options:
--name,-n <name> the name of this agent (required)
--conf-file,-f <file> specify a config file (required if -z missing)
--zkConnString,-z <str> specify the ZooKeeper connection to use (required if -f missing)
--zkBasePath,-p <path> specify the base path in ZooKeeper for agent configs
--no-reload-conf do not reload config file if changed
--help,-h display help text
avro-client options:
--rpcProps,-P <file> RPC client properties file with server connection params
--host,-H <host> hostname to which events will be sent
--port,-p <port> port of the avro source
--dirname <dir> directory to stream to avro source
--filename,-F <file> text file to stream to avro source (default: std input)
--headerFile,-R <file> File containing event headers as key/value pairs on each new line
--help,-h display help text
Either --rpcProps or both --host and --port must be specified.
Note that if <conf> directory is specified, then it is always included first
in the classpath.
1-5)、启动程序
flume-ng agent -c conf -f netcat-logger.conf -n a1 -Dflume.root.logger=INFO,console
-flume-ng agent run a Flume agent
-c conf 指定flume自身的配置文件所在目录
-f conf/netcat-logger.con 指定我们所描述的采集方案
-n a1 指定我们这个agent的名字
-Dflume.root.logger=INFO,console 显示日志打印的方式
flume-ng agent -c conf -f netcat-logger.conf -n a1
Flume 运行实例
1-1)、本地控制台案例
Flume支持众多的source和sink类型,详细手册可参考官方文档
http://flume.apache.org/FlumeUserGuide.html
可以现在conf的目录下新建文件nicate-logger.conf
[root@hadoop1 conf]# vi nicate-logger.conf
# 定义这个agent中各组件的名字
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# 描述和配置source组件:r1
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# 描述和配置sink组件:k1
a1.sinks.k1.type = logger
# 描述和配置channel组件,此处使用是内存缓存的方式
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 描述和配置source channel sink之间的连接关系
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
[root@hadoop1 conf]# flume-ng agent -c conf -f /usr/local/flume/conf/nicate-logger.conf -n a1 -Dflume.root.logger=INFO,console
Info: Including Hadoop libraries found via (/usr/local/hadoop-2.6.4/bin/hadoop) for HDFS access
Info: Excluding /usr/local/hadoop-2.6.4/share/hadoop/common/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /usr/local/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar from classpath
Info: Including Hive libraries found via () for Hive access
+ exec /home/jdk1.7/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp 'conf:/usr/local/flume/lib/*:/usr/local/hadoop-2.6.4/etc/hadoop:/usr/local/hadoop-2.6.4/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop-2.6.4/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop-2.6.4/share/hadoop/common/lib/apacheds-kerberos...................省略其他的加载配置............usr/local/hadoop-2.6.4/share/hadoop/mapreduce/lib:/usr/local/hadoop-2.6.4/share/hadoop/mapreduce/lib-examples:/usr/local/hadoop-2.6.4/share/hadoop/mapreduce/sources:/usr/local/hadoop-2.6.4/contrib/capacity-scheduler/*.jar:/lib/*' -Djava.library.path=:/usr/local/hadoop-2.6.4/lib/native org.apache.flume.node.Application -f /usr/local/flume/conf/nicate-logger.conf -n a1
16/09/25 10:53:34 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
16/09/25 10:53:34 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/usr/local/flume/conf/nicate-logger.conf
16/09/25 10:53:34 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1
16/09/25 10:53:34 INFO conf.FlumeConfiguration: Processing:k1
16/09/25 10:53:34 INFO conf.FlumeConfiguration: Processing:k1
16/09/25 10:53:34 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [a1]
16/09/25 10:53:34 INFO node.AbstractConfigurationProvider: Creating channels
16/09/25 10:53:34 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory
16/09/25 10:53:34 INFO node.AbstractConfigurationProvider: Created channel c1
16/09/25 10:53:34 INFO source.DefaultSourceFactory: Creating instance of source r1, type netcat
16/09/25 10:53:34 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: logger
16/09/25 10:53:34 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1]
16/09/25 10:53:34 INFO node.Application: Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@6c62aa33 counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
16/09/25 10:53:34 INFO node.Application: Starting Channel c1
16/09/25 10:53:34 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
16/09/25 10:53:34 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
16/09/25 10:53:34 INFO node.Application: Starting Sink k1
16/09/25 10:53:34 INFO node.Application: Starting Source r1
16/09/25 10:53:34 INFO source.NetcatSource: Source starting
16/09/25 10:53:35 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]
之所以特出来启动信息,为了详细的显示了加载配置的信息。
安装telnet :
[root@hadoop1 flume]# yum list | grep telnet
telnet.x86_64 1:0.17-48.el6 base
telnet-server.x86_64 1:0.17-48.el6 base
[root@hadoop1 flume]# yum install telnet.x86_64
测试:
[root@hadoop1 ~]# telnet 127.0.0.1 44444
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
xiaozhang
OK
xiaowang
OK
服务器端打印的日志信息:
16/09/25 10:53:34 INFO node.Application: Starting Sink k1
16/09/25 10:53:34 INFO node.Application: Starting Source r1
16/09/25 10:53:34 INFO source.NetcatSource: Source starting
16/09/25 10:53:35 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]
16/09/25 11:16:29 INFO sink.LoggerSink: Event: { headers:{} body: 78 69 61 6F 7A 68 61 6E 67 0D xiaozhang. }
16/09/25 11:16:31 INFO sink.LoggerSink: Event: { headers:{} body: 78 69 61 6F 77 61 6E 67 0D xiaowang. }
注意空格问题:
在配置时要有空格如:
a1.sources = r1
连接时:[root@hadoop2 ~]# telnet 127.0.0.1 44444 也要有空格
在打印日志是显示headers / body (分为编码与内容)
1-2)、本地单机HDFS测试案例
[root@hadoop1 conf]# vi tail-hdfs.conf
# tail-hdfs.conf
# 用tail命令获取数据,下沉到hdfs
# 启动命令:
# bin/flume-ng agent -c conf -f conf/tail-hdfs.conf -n a1
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /root/logs/test.log
a1.sources.r1.channels = c1
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /flume/tailout/%y-%m-%d/%H%M/
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.rollInterval = 3
a1.sinks.k1.hdfs.rollSize = 20
a1.sinks.k1.hdfs.rollCount = 5
a1.sinks.k1.hdfs.batchSize = 1
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#生成的文件类型,默认是Sequencefile,可用DataStream,则为普通文本
a1.sinks.k1.hdfs.fileType = DataStream
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
[root@hadoop1 conf]# flume-ng agent -c conf -f /use/local/flume/conf/tail-hdfs.conf -n agnet1 -Dflume.root.logger=INFO,console
16/07/30 20:29:28 INFO node.Application: Starting Channel c1
16/07/30 20:29:28 INFO node.Application: Waiting for channel: c1 to start. Sleeping for 500 ms
16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
16/07/30 20:29:28 INFO node.Application: Starting Sink k1
16/07/30 20:29:28 INFO node.Application: Starting Source r1
16/07/30 20:29:28 INFO source.ExecSource: Exec source starting with command:tail -F /home/flumeLog/test.log
16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
16/07/30 20:29:32 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
16/07/30 20:29:33 INFO hdfs.BucketWriter: Creating /flume/tailout/16-07-30/2020//events-.1469881772754.tmp
16/07/30 20:29:38 WARN hdfs.BucketWriter: Block Under-replication detected. Rotating file.
16/07/30 20:29:38 INFO hdfs.BucketWriter: Closing /flume/tailout/16-07-30/2020//events-.1469881772754.tmp
16/07/30 20:29:38 INFO hdfs.BucketWriter: Renaming /flume/tailout/16-07-30/2020/events-.1469881772754.tmp to /flume/tailout/16-07-30/2020/events-.1469881772754
16/07/30 20:29:38 INFO hdfs.BucketWriter: Creating /flume/tailout/16-07-30/2020//events-.1469881772755.tmp
16/07/30 20:50:22 INFO hdfs.BucketWriter: Closing /flume/tailout/16-07-30/2050//events-.1469883004544.tmp
16/07/30 20:50:23 INFO hdfs.BucketWriter: Renaming /flume/tailout/16-07-30/2050/events-.1469883004544.tmp to /flume/tailout/16-07-30/2050/events-.1469883004544
16/07/30 20:50:25 INFO hdfs.BucketWriter: Creating /flume/tailout/16-07-30/2050//events-.1469883004545.tmp
查看HDFS的动态信息:
[root@hadoop1 ~]# hadoop fs -ls -R /flume/tailout/16-07-30/
-rw-r--r-- 3 root supergroup 0 2016-07-30 20:50 /flume/tailout/16-07-30/2050/events-.1469883004544.tmp
HDFS 文件信息:
[root@hadoop1~]# hadoop fs -cat /flume/tailout/16-07-30/2020/events-.1469881772754
Sat Jul 30 20:23:09 CST 2016
[root@hadoop1 ~]# hadoop fs -cat /flume/tailout/16-07-30/2030/events-.1469882131920
Sat Jul 30 20:35:45 CST 2016
注意标红的信息:
1 、观察日志会发现同时会处理三条信息,神不知道,哪位大神指导,求指教(我靠发现是先往文件中写入数据,再上传到HDFS上只不过时间太短了,有做好了下一个写入的准备,下一个的写入准备正好是上一个的准备的事件)。
2、2050 会截取上一个整数的事件点,来获取数据
- 往HDFS写入数据需要在运行在HDFS环境中运行,也就是说只要运行在HDFS红就可以啦。多方便啊!!!!!!
1-3)、采集文件目录
[root@hadoop1 ~]# cd /usr/local/flume/
[root@hadoop1 flume]# mkdir testData
[root@hadoop1 conf]# vi flume_dir.conf
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /usr/local/flume/testData
a1.sources.r1.fileHeader = true
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.rollInterval = 3
# 可以调文件的大小
a1.sinks.k1.hdfs.rollSize = 20
a1.sinks.k1.hdfs.rollCount = 5
a1.sinks.k1.hdfs.batchSize = 1
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.fileType = DataStream
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
[root@hadoop1 flume]# vi a.text
[root@hadoop1 flume]# vi b.text
[root@hadoop1 flume]# mv a.text b.text testData/
[root@hadoop1 flume]# cd testData/
[root@hadoop1 testData]# ls
a.text.COMPLETED b.text.COMPLETED
注意:不能再监控的目录下创建目录,如果创建目录会不采集,也不能放重复的文件,如果重复Flume会停止运行,目的为不重复采集,后缀为COMPLETED则表示已经采集
[root@hadoop1 conf]# flume-ng agent -c conf -f ../conf/flume_dir.conf -n a1 -Dflume.root.logger=INFO,console
16/09/26 00:28:47 INFO hdfs.BucketWriter: Renaming /flume/events/16-09-26/0020/events-.1474874885229.tmp to /flume/events/16-09-26/0020/events-.1474874885229
16/09/26 00:28:47 INFO hdfs.BucketWriter: Creating /flume/events/16-09-26/0020//events-.1474874885230.tmp
16/09/26 00:28:47 INFO hdfs.BucketWriter: Closing /flume/events/16-09-26/0020//events-.1474874885230.tmp
16/09/26 00:28:47 INFO hdfs.BucketWriter: Renaming /flume/events/16-09-26/0020/events-.1474874885230.tmp to /flume/events/16-09-26/0020/events-.1474874885230
16/09/26 00:28:47 INFO hdfs.BucketWriter: Creating /flume/events/16-09-26/0020//events-.1474874885231.tmp
16/09/26 00:28:49 INFO hdfs.BucketWriter: Closing /flume/events/16-09-26/0020//events-.1474874885231.tmp
16/09/26 00:28:49 INFO hdfs.BucketWriter: Renaming /flume/events/16-09-26/0020/events-.1474874885231.tmp to /flume/events/16-09-26/0020/events-.1474874885231
16/09/26 00:28:49 INFO hdfs.BucketWriter: Creating /flume/events/16-09-26/0020//events-.1474874885232.tmp
16/09/26 00:28:52 INFO hdfs.BucketWriter: Closing /flume/events/16-09-26/0020//events-.1474874885232.tmp
16/09/26 00:28:52 INFO hdfs.BucketWriter: Renaming /flume/events/16-09-26/0020/events-.1474874885232.tmp to /flume/events/16-09-26/0020/events-.1474874885232
16/09/26 00:28:52 INFO hdfs.HDFSEventSink: Writer callback called.
[root@hadoop1 testData]# hadoop fs -ls /flume/events/16-09-26/0020
Found 4 items
-rw-r--r-- 3 root supergroup 18 2016-09-26 00:28 /flume/events/16-09-26/0020/events-.1474874885229
-rw-r--r-- 3 root supergroup 12 2016-09-26 00:28 /flume/events/16-09-26/0020/events-.1474874885230
-rw-r--r-- 3 root supergroup 13 2016-09-26 00:28 /flume/events/16-09-26/0020/events-.1474874885231
-rw-r--r-- 3 root supergroup 11 2016-09-26 00:28 /flume/events/16-09-26/0020/events-.1474874885232
1-4)、两个机器连接
Hadoop1 当source ,hadoop2当sink(控制台显示)
[root@hadoop1 flume]# mkdir agentConf
[root@hadoop1 flume]# vi tail-avro.properties
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /root/log/test.log
a1.sources.r1.channels = c1
# Describe the sink
##sink端的avro是一个数据发送者
a1.sinks = k1
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = hadoop2
a1.sinks.k1.port = 8888
a1.sinks.k1.batch-size = 2
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
[root@hadoop2 flume]# mkdir agentConf
[root@hadoop2 flume]# vi avro-hdfs.properties
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 8888
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
先启动hadoop2:
flume-ng agent -c conf -f avro-hdfs.conf -n a1 -Dflume.root.logger=INFO,console
再启动hadoop1:
flume-ng agent -c conf -f agentConf/tail-avro.properties -n a1 -Dflume.root.logger=INFO,console
写入数据:
[root@hadoop1 log]# while true; do date >> test.log ; sleep 1; done;
[root@hadoop1 log]# tail -f test.log
Mon Sep 26 02:07:29 PDT 2016
Mon Sep 26 02:07:30 PDT 2016
Mon Sep 26 02:07:31 PDT 2016
Mon Sep 26 02:07:32 PDT 2016
Mon Sep 26 02:07:33 PDT 2016
Mon Sep 26 02:07:34 PDT 2016
Mon Sep 26 02:07:35 PDT 2016
Mon Sep 26 02:07:36 PDT 2016
Hadoop1日志:
*******
16/09/26 02:07:04 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
16/09/26 02:07:04 INFO sink.AbstractRpcSink: Rpc sink k1: Building RpcClient with hostname: hadoop2, port: 4141
16/09/26 02:07:04 INFO sink.AvroSink: Attempting to create Avro Rpc client.
16/09/26 02:07:04 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
16/09/26 02:07:04 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
16/09/26 02:07:04 WARN api.NettyAvroRpcClient: Using default maxIOWorkers
16/09/26 02:07:05 INFO sink.AbstractRpcSink: Rpc sink k1 started.
Hadoop2 日志:
******
16/09/26 02:07:37 INFO sink.LoggerSink: Event: { headers:{} body: 4D 6F 6E 20 53 65 70 20 32 36 20 30 32 3A 30 37 Mon Sep 26 02:07 }
16/09/26 02:07:39 INFO sink.LoggerSink: Event: { headers:{} body: 4D 6F 6E 20 53 65 70 20 32 36 20 30 32 3A 30 37 Mon Sep 26 02:07 }
16/09/26 02:07:39 INFO sink.LoggerSink: Event: { headers:{} body: 4D 6F 6E 20 53 65 70 20 32 36 20 30 32 3A 30 37 Mon Sep 26 02:07 }
16/09/26 02:07:39 INFO sink.LoggerSink: Event: { headers:{} body: 4D 6F 6E 20 53 65 70 20 32 36 20 30 32 3A 30 37 Mon Sep 26 02:07 }
16/09/26 02:07:39 INFO sink.LoggerSink: Event: { headers:{} body: 4D 6F 6E 20 53 65 70 20 32 36 20 30 32 3A 30 37 Mon Sep 26 02:07 }
******
1-5)、多机测试实例(高可用配置)
Hadoo1 做sources ,hadoop2于hadoop3做sinks的写入的HDFS数据
1-1)、Hadoop1 配置
# 从tail命令获取数据发送到avro端口
# 另一个节点可配置一个avro源来中继数据,发送外部存储
[root@hadoop1 conf]# vi sources-hdfs.conf
#agent1 name
agent1.channels = c1
agent1.sources = r1
agent1.sinks = k1 k2
#set gruop
agent1.sinkgroups = g1
#set channel
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 1000
agent1.channels.c1.transactionCapacity = 100
agent1.sources.r1.channels = c1
agent1.sources.r1.type = exec
agent1.sources.r1.command = tail -F /root/log/test.log
# set interceptors
agent1.sources.r1.interceptors = i1 i2
agent1.sources.r1.interceptors.i1.type = static
agent1.sources.r1.interceptors.i1.key = Type
agent1.sources.r1.interceptors.i1.value = LOGIN
agent1.sources.r1.interceptors.i2.type = timestamp
# set sink1
agent1.sinks.k1.channel = c1
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = hadoop2
agent1.sinks.k1.port = 52020
# set sink2
agent1.sinks.k2.channel = c1
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = hadoop3
agent1.sinks.k2.port = 52020
#set sink group
agent1.sinkgroups.g1.sinks = k1 k2
#set failover
agent1.sinkgroups.g1.processor.type = failover
agent1.sinkgroups.g1.processor.priority.k1 = 10
agent1.sinkgroups.g1.processor.priority.k2 = 1
agent1.sinkgroups.g1.processor.maxpenalty = 10000
创建文件:
[root@hadoop1 ~]# mkdir -p /root/log/
[root@hadoop1 log]# touch test.log
1-2)、Hadoop2 配制
# 采集配置文件,
[root@hadoop2 conf]# vi sinks-hdfs.conf
#set Agent name
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#set channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# other node,nna to nns
a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop2
a1.sources.r1.port = 52020
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = Collector
a1.sources.r1.interceptors.i1.value = hadoop2
a1.sources.r1.channels = c1
#set sink to hdfs
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=/home/hdfs/flume/logdfs
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=TEXT
a1.sinks.k1.hdfs.rollInterval=10
a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d
# 发送数据:
# $ bin/flume-ng avro-client -H localhost -p 4141 -F /usr/logs/log.10
1-3)、Hadoop3配置
# 采集配置文件,
[root@hadoop3 conf]# vi sinks-hdfs.conf
#set Agent name
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#set channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# other node,nna to nns
a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop3
a1.sources.r1.port = 52020
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = Collector
a1.sources.r1.interceptors.i1.value = hadoop3
a1.sources.r1.channels = c1
#set sink to hdfs
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=/home/hdfs/flume/logdfs
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=TEXT
a1.sinks.k1.hdfs.rollInterval=10
a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d
# 发送数据:
# $ bin/flume-ng avro-client -H localhost -p 4141 -F /usr/logs/log.10
1-1)、Hadoop3 日志
[root@hadoop3 agentConf]# flume-ng agent -c conf -f collector.properties -n a1 -Dflume.root.logger=INFO,console
*******
16/09/26 02:55:55 INFO node.Application: Starting Channel c1
16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
16/09/26 02:55:55 INFO node.Application: Starting Sink k1
16/09/26 02:55:55 INFO node.Application: Starting Source r1
16/09/26 02:55:55 INFO source.AvroSource: Starting Avro source r1: { bindAddress: hadoop3, port: 52020 }...
16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
******
1-2)、Hadoop2日志
[root@hadoop2 agentConf]# flume-ng agent -c conf -f collector.properties -n a1 -Dflume.root.logger=INFO,console
******
{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@687090a8 counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
16/09/26 02:55:50 INFO node.Application: Starting Channel c1
16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
16/09/26 02:55:50 INFO node.Application: Starting Sink k1
16/09/26 02:55:50 INFO node.Application: Starting Source r1
16/09/26 02:55:50 INFO source.AvroSource: Starting Avro source r1: { bindAddress: hadoop2, port: 52020 }...
16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
*******
1-3)、Hadoop1日志
[root@hadoop1 agentConf]# flume-ng agent -c conf -f agent.properties -n agent1 -Dflume.root.logger=INFO,console
*******
16/09/26 02:56:03 INFO conf.FlumeConfiguration: Processing:k2
16/09/26 02:56:03 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent1]
16/09/26 02:56:03 INFO node.AbstractConfigurationProvider: Creating channels
16/09/26 02:56:03 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory
16/09/26 02:56:03 INFO node.AbstractConfigurationProvider: Created channel c1
16/09/26 02:56:03 INFO source.DefaultSourceFactory: Creating instance of source r1, type exec
16/09/26 02:56:03 INFO interceptor.StaticInterceptor: Creating StaticInterceptor: preserveExisting=true,key=Type,value=LOGIN
16/09/26 02:56:03 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: avro
16/09/26 02:56:03 INFO sink.AbstractRpcSink: Connection reset is set to 0. Will not reset connection to next hop
16/09/26 02:56:03 INFO sink.DefaultSinkFactory: Creating instance of sink: k2, type: avro
16/09/26 02:56:03 INFO sink.AbstractRpcSink: Connection reset is set to 0. Will not reset connection to next hop
16/09/26 02:56:03 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1, k2]
*********
1-1)、写入数据
[root@hadoop1 log]# while true; do date >> test.log ; sleep 1; done;
[root@hadoop1 log]# tail -f test.log
Mon Sep 26 02:57:13 PDT 2016
Mon Sep 26 02:57:14 PDT 2016
Mon Sep 26 02:57:15 PDT 2016
Mon Sep 26 02:57:16 PDT 2016
Mon Sep 26 02:57:17 PDT 2016
1-2)、Hadoop3显示
16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 stopped
16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.start.time == 1474883755212
16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.stop.time == 1474883809719
16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.batch.complete == 0
16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.batch.empty == 9
16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.batch.underflow == 0
16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.connection.closed.count == 0
16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.connection.creation.count == 0
16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.connection.failed.count == 0
16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.event.drain.attempt == 0
16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.event.drain.sucess == 0
1-3)、Hadoop2显示
16/09/26 02:56:12 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
16/09/26 02:56:13 INFO hdfs.BucketWriter: Creating /home/hdfs/flume/logdfs/2016-09-26.1474883772390.tmp
16/09/26 02:56:18 INFO hdfs.BucketWriter: Closing /home/hdfs/flume/logdfs/2016-09-26.1474883772390.tmp
16/09/26 02:56:18 INFO hdfs.BucketWriter: Renaming /home/hdfs/flume/logdfs/2016-09-26.1474883772390.tmp to /home/hdfs/flume/logdfs/2016-09-26.1474883772390
16/09/26 02:56:18 INFO hdfs.BucketWriter: Creating /home/hdfs/flume/logdfs/2016-09-26.1474883772391.tmp
16/09/26 02:56:26 INFO hdfs.BucketWriter: Closing /home/hdfs/flume/logdfs/2016-09-26.1474883772391.tmp
16/09/26 02:56:26 INFO hdfs.BucketWriter: Renaming /home/hdfs/flume/logdfs/2016-09-26.1474883772391.tmp to /home/hdfs/flume/logdfs/2016-09-26.1474883772391
16/09/26 02:56:26 INFO hdfs.BucketWriter: Creating /home/hdfs/flume/logdfs/2016-09-26.1474883772392.tmp
16/09/26 02:56:35 INFO hdfs.BucketWriter: Closing /home/hdfs/flume/logdfs/2016-09-26.1474883772392.tmp
1-4)、Hadoop1显示
16/09/26 02:56:04 INFO sink.AbstractRpcSink: Rpc sink k1 started.
16/09/26 02:56:04 INFO sink.AbstractRpcSink: Starting RpcSink k2 { host: hadoop3, port: 52020 }...
16/09/26 02:56:04 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k2: Successfully registered new MBean.
16/09/26 02:56:04 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k2 started
16/09/26 02:56:04 INFO sink.AbstractRpcSink: Rpc sink k2: Building RpcClient with hostname: hadoop3, port: 52020
16/09/26 02:56:04 INFO sink.AvroSink: Attempting to create Avro Rpc client.
16/09/26 02:56:04 WARN api.NettyAvroRpcClient: Using default maxIOWorkers
16/09/26 02:56:04 INFO sink.AbstractRpcSink: Rpc sink k2 started.
16/09/26 02:57:17 WARN sink.FailoverSinkProcessor: Sink k1 failed and has been sent to failover list
1-6)、配置详解
a1.sources = r1
a1.channels = c1
a1.sinks = k1
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /usr/local/flume/configurationFile/test.text
a1.sources.r1.channels = c1
a1.channels.c1.type=memory
a1.channels.c1.capacity=10000
a1.channels.c1.transactionCapacity=100
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = kafkaTest
a1.sinks.k1.brokerList = hadoop1:9092
a1.sinks.k1.requiredAcks = 1
a1.sinks.k1.batchSize = 20
a1.sinks.k1.channel = c1
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1