快学Big Data -- Flume(十五)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/xfg0218/article/details/82343547

Flume 总结

官网:http://flume.apache.org/FlumeUserGuide.html

 

概述

Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统。

Flume可以采集文件,socket数据包等各种形式源数据,又可以将采集到的数据输出到HDFS、hbase、hive、kafka等众多外部存储系统中

一般的采集需求,通过对flume的简单配置即可实现

Flume针对特殊场景也具备良好的自定义扩展能力,因此,flume可以适用于大部分的日常数据采集场景

运行机制

1、Flume分布式系统中最核心的角色是agent,flume采集系统就是由一个个agent所连接起来形成

2、每一个agent相当于一个数据传递员,内部有三个组件:

a)Source:采集源,用于跟数据源对接,以获取数据

b)Sink:下沉地,采集数据的传送目的,用于往下一级agent传递数据或者往最终存储系统传递数据

c)Channel:angent内部的数据传输通道,用于从source将数据传递到sink

 

注:

Source 到 Channel 到 Sink之间传递数据的形式是Event事件;Event事件是一个数据流单元。

Source  +  sink   +  channel   =  Agent  

 

单个Agent采集

 

 

 

多个Agent的实例

 

 

多路(Multiplexing)Agent

架构设计要点

Flume的架构主要有一下几个核心概念:

Event:一个数据单元,带有一个可选的消息头

Flow:Event从源点到达目的点的迁移的抽象

Client:操作位于源点处的Event,将其发送到Flume Agent

Agent:一个独立的Flume进程,包含组件Source、Channel、Sink

Source:用来消费传递到该组件的Event

Channel:中转Event的一个临时存储,保存有Source组件传递过来的Event

Sink:从Channel中读取并移除Event,将Event传递到Flow Pipeline中的下一个Agent(如果有的话)

 

 

Flume安装

安装包下载:

链接:http://pan.baidu.com/s/1miycTfU 密码:2ea4 如果无法下载请联系作者。

 

 

1-1)、安装

[root@hadoop1 /]#  tar -zxvf apache-flume-1.6.0-bin.tar.gz

[root@hadoop1 /]#  mv apache-flume-1.6.0-bin flume

[root@hadoop1 /]#  cd /flume/conf/

[root@hadoop1 /]#  mv flume-env.sh.template  flume-env.sh

1-2)、修改配置文件

[root@hadoop1 /]#  vi flume-env.sh    

 ---- 修改JAVA_HOME的路径

 

1-3)、添加快捷方式

[root@hadoop1 /]#  Vi /etc/profile

export FLUME_HOME=/usr/local/flume

 

[root@hadoop1 ~]# source /etc/profile

1-4)、常见的命令

[root@hadoop1 flume]# flume-ng -help

Error: Unknown or unspecified command '-help'

 

Usage: /usr/local/flume/bin/flume-ng <command> [options]...

 

commands:

  help                      display this help text

  agent                     run a Flume agent

  avro-client               run an avro Flume client

  version                   show Flume version info

 

global options:

  --conf,-c <conf>          use configs in <conf> directory

  --classpath,-C <cp>       append to the classpath

  --dryrun,-d               do not actually start Flume, just print the command

  --plugins-path <dirs>     colon-separated list of plugins.d directories. See the

                            plugins.d section in the user guide for more details.

                            Default: $FLUME_HOME/plugins.d

  -Dproperty=value          sets a Java system property value

  -Xproperty=value          sets a Java -X option

 

agent options:

  --name,-n <name>          the name of this agent (required)

  --conf-file,-f <file>     specify a config file (required if -z missing)

  --zkConnString,-z <str>   specify the ZooKeeper connection to use (required if -f missing)

  --zkBasePath,-p <path>    specify the base path in ZooKeeper for agent configs

  --no-reload-conf          do not reload config file if changed

  --help,-h                 display help text

 

avro-client options:

  --rpcProps,-P <file>   RPC client properties file with server connection params

  --host,-H <host>       hostname to which events will be sent

  --port,-p <port>       port of the avro source

  --dirname <dir>        directory to stream to avro source

  --filename,-F <file>   text file to stream to avro source (default: std input)

  --headerFile,-R <file> File containing event headers as key/value pairs on each new line

  --help,-h              display help text

 

  Either --rpcProps or both --host and --port must be specified.

 

Note that if <conf> directory is specified, then it is always included first

in the classpath.

 

1-5)、启动程序

A)、前段启动

flume-ng agent -c conf -f  netcat-logger.conf  -n a1  -Dflume.root.logger=INFO,console

 

-flume-ng agent    run a Flume agent

-c conf   指定flume自身的配置文件所在目录

-f conf/netcat-logger.con  指定我们所描述的采集方案

-n a1  指定我们这个agent的名字

-Dflume.root.logger=INFO,console 显示日志打印的方式

 

B)、后端启动

flume-ng agent -c conf -f  netcat-logger.conf  -n a1

Flume 运行实例

1-1)、本地控制台案例

Flume支持众多的source和sink类型,详细手册可参考官方文档

http://flume.apache.org/FlumeUserGuide.html

A)、配置

 可以现在conf的目录下新建文件nicate-logger.conf

 

 [root@hadoop1 conf]#  vi  nicate-logger.conf

 

# 定义这个agent中各组件的名字

a1.sources = r1  

a1.sinks = k1

a1.channels = c1  

 

# 描述和配置source组件:r1

a1.sources.r1.type = netcat    

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

 

# 描述和配置sink组件:k1

a1.sinks.k1.type = logger

 

# 描述和配置channel组件,此处使用是内存缓存的方式

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

# 描述和配置source  channel   sink之间的连接关系

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

B)、启动flume-ng

[root@hadoop1 conf]# flume-ng agent -c conf -f /usr/local/flume/conf/nicate-logger.conf   -n a1  -Dflume.root.logger=INFO,console

Info: Including Hadoop libraries found via (/usr/local/hadoop-2.6.4/bin/hadoop) for HDFS access

Info: Excluding /usr/local/hadoop-2.6.4/share/hadoop/common/lib/slf4j-api-1.7.5.jar from classpath

Info: Excluding /usr/local/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar from classpath

Info: Including Hive libraries found via () for Hive access

+ exec /home/jdk1.7/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp 'conf:/usr/local/flume/lib/*:/usr/local/hadoop-2.6.4/etc/hadoop:/usr/local/hadoop-2.6.4/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop-2.6.4/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop-2.6.4/share/hadoop/common/lib/apacheds-kerberos...................省略其他的加载配置............usr/local/hadoop-2.6.4/share/hadoop/mapreduce/lib:/usr/local/hadoop-2.6.4/share/hadoop/mapreduce/lib-examples:/usr/local/hadoop-2.6.4/share/hadoop/mapreduce/sources:/usr/local/hadoop-2.6.4/contrib/capacity-scheduler/*.jar:/lib/*' -Djava.library.path=:/usr/local/hadoop-2.6.4/lib/native org.apache.flume.node.Application -f /usr/local/flume/conf/nicate-logger.conf -n a1

16/09/25 10:53:34 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting

16/09/25 10:53:34 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/usr/local/flume/conf/nicate-logger.conf

16/09/25 10:53:34 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1

16/09/25 10:53:34 INFO conf.FlumeConfiguration: Processing:k1

16/09/25 10:53:34 INFO conf.FlumeConfiguration: Processing:k1

16/09/25 10:53:34 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [a1]

16/09/25 10:53:34 INFO node.AbstractConfigurationProvider: Creating channels

16/09/25 10:53:34 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory

16/09/25 10:53:34 INFO node.AbstractConfigurationProvider: Created channel c1

16/09/25 10:53:34 INFO source.DefaultSourceFactory: Creating instance of source r1, type netcat

16/09/25 10:53:34 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: logger

16/09/25 10:53:34 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1]

16/09/25 10:53:34 INFO node.Application: Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@6c62aa33 counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }

16/09/25 10:53:34 INFO node.Application: Starting Channel c1

16/09/25 10:53:34 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.

16/09/25 10:53:34 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started

16/09/25 10:53:34 INFO node.Application: Starting Sink k1

16/09/25 10:53:34 INFO node.Application: Starting Source r1

16/09/25 10:53:34 INFO source.NetcatSource: Source starting

16/09/25 10:53:35 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]

 

之所以特出来启动信息,为了详细的显示了加载配置的信息。

C)、使用telnet 测试

安装telnet :

[root@hadoop1 flume]# yum list | grep telnet

telnet.x86_64                              1:0.17-48.el6                 base   

telnet-server.x86_64                       1:0.17-48.el6                 base

 

[root@hadoop1 flume]# yum install telnet.x86_64

 

测试:

[root@hadoop1 ~]# telnet 127.0.0.1 44444

Trying 127.0.0.1...

Connected to 127.0.0.1.

Escape character is '^]'.

xiaozhang

OK

xiaowang

OK

 

服务器端打印的日志信息:

16/09/25 10:53:34 INFO node.Application: Starting Sink k1

16/09/25 10:53:34 INFO node.Application: Starting Source r1

16/09/25 10:53:34 INFO source.NetcatSource: Source starting

16/09/25 10:53:35 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]

16/09/25 11:16:29 INFO sink.LoggerSink: Event: { headers:{} body: 78 69 61 6F 7A 68 61 6E 67 0D                   xiaozhang. }

16/09/25 11:16:31 INFO sink.LoggerSink: Event: { headers:{} body: 78 69 61 6F 77 61 6E 67 0D                      xiaowang. }

 

 

注意空格问题:

 

在配置时要有空格如:

a1.sources = r1  

连接时:[root@hadoop2 ~]# telnet 127.0.0.1 44444 也要有空格

在打印日志是显示headers /  body (分为编码与内容)

 

1-2)、本地单机HDFS测试案例

A)、配置

[root@hadoop1 conf]#  vi  tail-hdfs.conf

 

# tail-hdfs.conf

# 用tail命令获取数据,下沉到hdfs

# 启动命令:

# bin/flume-ng agent -c conf -f conf/tail-hdfs.conf -n a1

 

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

 

# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /root/logs/test.log

a1.sources.r1.channels = c1

 

# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.channel = c1

a1.sinks.k1.hdfs.path = /flume/tailout/%y-%m-%d/%H%M/

a1.sinks.k1.hdfs.filePrefix = events-

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 10

a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.rollInterval = 3

a1.sinks.k1.hdfs.rollSize = 20

a1.sinks.k1.hdfs.rollCount = 5

a1.sinks.k1.hdfs.batchSize = 1

a1.sinks.k1.hdfs.useLocalTimeStamp = true

#生成的文件类型,默认是Sequencefile,可用DataStream,则为普通文本

a1.sinks.k1.hdfs.fileType = DataStream

 

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

B)、测试

 [root@hadoop1 conf]#  flume-ng agent -c conf -f /use/local/flume/conf/tail-hdfs.conf   -n agnet1  -Dflume.root.logger=INFO,console

 

 

16/07/30 20:29:28 INFO node.Application: Starting Channel c1

16/07/30 20:29:28 INFO node.Application: Waiting for channel: c1 to start. Sleeping for 500 ms

16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.

16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started

16/07/30 20:29:28 INFO node.Application: Starting Sink k1

16/07/30 20:29:28 INFO node.Application: Starting Source r1

16/07/30 20:29:28 INFO source.ExecSource: Exec source starting with command:tail -F /home/flumeLog/test.log

16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.

16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started

16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.

16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started

16/07/30 20:29:32 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false

16/07/30 20:29:33 INFO hdfs.BucketWriter: Creating /flume/tailout/16-07-30/2020//events-.1469881772754.tmp

16/07/30 20:29:38 WARN hdfs.BucketWriter: Block Under-replication detected. Rotating file.

16/07/30 20:29:38 INFO hdfs.BucketWriter: Closing /flume/tailout/16-07-30/2020//events-.1469881772754.tmp

16/07/30 20:29:38 INFO hdfs.BucketWriter: Renaming /flume/tailout/16-07-30/2020/events-.1469881772754.tmp to /flume/tailout/16-07-30/2020/events-.1469881772754

16/07/30 20:29:38 INFO hdfs.BucketWriter: Creating /flume/tailout/16-07-30/2020//events-.1469881772755.tmp

  16/07/30 20:50:22 INFO hdfs.BucketWriter: Closing /flume/tailout/16-07-30/2050//events-.1469883004544.tmp

16/07/30 20:50:23 INFO hdfs.BucketWriter: Renaming /flume/tailout/16-07-30/2050/events-.1469883004544.tmp to /flume/tailout/16-07-30/2050/events-.1469883004544

16/07/30 20:50:25 INFO hdfs.BucketWriter: Creating /flume/tailout/16-07-30/2050//events-.1469883004545.tmp

 

查看HDFS的动态信息:

[root@hadoop1 ~]# hadoop fs -ls -R   /flume/tailout/16-07-30/

-rw-r--r--   3 root supergroup          0 2016-07-30 20:50 /flume/tailout/16-07-30/2050/events-.1469883004544.tmp

 

HDFS 文件信息:

[root@hadoop1~]# hadoop fs -cat  /flume/tailout/16-07-30/2020/events-.1469881772754

Sat Jul 30 20:23:09 CST 2016

 

[root@hadoop1 ~]# hadoop fs -cat  /flume/tailout/16-07-30/2030/events-.1469882131920

Sat Jul 30 20:35:45 CST 2016

 

注意标红的信息:

1 、观察日志会发现同时会处理三条信息,神不知道,哪位大神指导,求指教(我靠发现是先往文件中写入数据,再上传到HDFS上只不过时间太短了,有做好了下一个写入的准备,下一个的写入准备正好是上一个的准备的事件)。

2、2050 会截取上一个整数的事件点,来获取数据

  1. 往HDFS写入数据需要在运行在HDFS环境中运行,也就是说只要运行在HDFS红就可以啦。多方便啊!!!!!!

1-3)、采集文件目录

A)、配置

[root@hadoop1 ~]# cd /usr/local/flume/

[root@hadoop1 flume]# mkdir testData

 

[root@hadoop1 conf]# vi flume_dir.conf

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

 

# Describe/configure the source

a1.sources.r1.type = spooldir

a1.sources.r1.spoolDir = /usr/local/flume/testData

a1.sources.r1.fileHeader = true

 

# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.channel = c1

a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/

a1.sinks.k1.hdfs.filePrefix = events-

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 10

a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.rollInterval = 3

# 可以调文件的大小

a1.sinks.k1.hdfs.rollSize = 20

a1.sinks.k1.hdfs.rollCount = 5

a1.sinks.k1.hdfs.batchSize = 1

a1.sinks.k1.hdfs.useLocalTimeStamp = true

a1.sinks.k1.hdfs.fileType = DataStream

 

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

 

[root@hadoop1 flume]# vi a.text

[root@hadoop1 flume]# vi b.text

[root@hadoop1 flume]# mv a.text  b.text  testData/

[root@hadoop1 flume]# cd testData/

[root@hadoop1 testData]# ls

a.text.COMPLETED  b.text.COMPLETED

 

 

注意:不能再监控的目录下创建目录,如果创建目录会不采集,也不能放重复的文件,如果重复Flume会停止运行,目的为不重复采集,后缀为COMPLETED则表示已经采集

 

B)、启动

[root@hadoop1 conf]# flume-ng agent -c conf -f  ../conf/flume_dir.conf  -n a1  -Dflume.root.logger=INFO,console

 

 

16/09/26 00:28:47 INFO hdfs.BucketWriter: Renaming /flume/events/16-09-26/0020/events-.1474874885229.tmp to /flume/events/16-09-26/0020/events-.1474874885229

16/09/26 00:28:47 INFO hdfs.BucketWriter: Creating /flume/events/16-09-26/0020//events-.1474874885230.tmp

16/09/26 00:28:47 INFO hdfs.BucketWriter: Closing /flume/events/16-09-26/0020//events-.1474874885230.tmp

16/09/26 00:28:47 INFO hdfs.BucketWriter: Renaming /flume/events/16-09-26/0020/events-.1474874885230.tmp to /flume/events/16-09-26/0020/events-.1474874885230

16/09/26 00:28:47 INFO hdfs.BucketWriter: Creating /flume/events/16-09-26/0020//events-.1474874885231.tmp

16/09/26 00:28:49 INFO hdfs.BucketWriter: Closing /flume/events/16-09-26/0020//events-.1474874885231.tmp

16/09/26 00:28:49 INFO hdfs.BucketWriter: Renaming /flume/events/16-09-26/0020/events-.1474874885231.tmp to /flume/events/16-09-26/0020/events-.1474874885231

16/09/26 00:28:49 INFO hdfs.BucketWriter: Creating /flume/events/16-09-26/0020//events-.1474874885232.tmp

16/09/26 00:28:52 INFO hdfs.BucketWriter: Closing /flume/events/16-09-26/0020//events-.1474874885232.tmp

16/09/26 00:28:52 INFO hdfs.BucketWriter: Renaming /flume/events/16-09-26/0020/events-.1474874885232.tmp to /flume/events/16-09-26/0020/events-.1474874885232

16/09/26 00:28:52 INFO hdfs.HDFSEventSink: Writer callback called.

 

C)、查看效果

[root@hadoop1 testData]# hadoop fs -ls /flume/events/16-09-26/0020

Found 4 items

-rw-r--r--   3 root supergroup         18 2016-09-26 00:28 /flume/events/16-09-26/0020/events-.1474874885229

-rw-r--r--   3 root supergroup         12 2016-09-26 00:28 /flume/events/16-09-26/0020/events-.1474874885230

-rw-r--r--   3 root supergroup         13 2016-09-26 00:28 /flume/events/16-09-26/0020/events-.1474874885231

-rw-r--r--   3 root supergroup         11 2016-09-26 00:28 /flume/events/16-09-26/0020/events-.1474874885232

 

 

 

1-4)、两个机器连接

Hadoop1 当source ,hadoop2当sink(控制台显示)

A)、hadoop1配置

[root@hadoop1 flume]#  mkdir agentConf

[root@hadoop1 flume]# vi tail-avro.properties

a1.sources = r1

a1.sinks = k1

a1.channels = c1

 

# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /root/log/test.log

a1.sources.r1.channels = c1

 

# Describe the sink

##sink端的avro是一个数据发送者

a1.sinks = k1

a1.sinks.k1.type = avro

a1.sinks.k1.channel = c1

a1.sinks.k1.hostname = hadoop2

a1.sinks.k1.port = 8888

a1.sinks.k1.batch-size = 2

 

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

B)、hadoop2配置

[root@hadoop2 flume]#  mkdir agentConf

[root@hadoop2 flume]#  vi avro-hdfs.properties

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

 

# Describe/configure the source

a1.sources.r1.type = avro

a1.sources.r1.channels = c1

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 8888

 

# Describe the sink

a1.sinks.k1.type = logger

 

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

 

C)、测试

先启动hadoop2:

 flume-ng agent -c conf -f avro-hdfs.conf   -n a1 -Dflume.root.logger=INFO,console

 

再启动hadoop1:

flume-ng agent -c conf -f  agentConf/tail-avro.properties  -n a1  -Dflume.root.logger=INFO,console

 

写入数据:

[root@hadoop1 log]# while true; do date >> test.log ; sleep 1; done;

[root@hadoop1 log]# tail -f test.log

Mon Sep 26 02:07:29 PDT 2016

Mon Sep 26 02:07:30 PDT 2016

Mon Sep 26 02:07:31 PDT 2016

Mon Sep 26 02:07:32 PDT 2016

Mon Sep 26 02:07:33 PDT 2016

Mon Sep 26 02:07:34 PDT 2016

Mon Sep 26 02:07:35 PDT 2016

Mon Sep 26 02:07:36 PDT 2016

 

Hadoop1日志:

*******

16/09/26 02:07:04 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started

16/09/26 02:07:04 INFO sink.AbstractRpcSink: Rpc sink k1: Building RpcClient with hostname: hadoop2, port: 4141

16/09/26 02:07:04 INFO sink.AvroSink: Attempting to create Avro Rpc client.

16/09/26 02:07:04 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.

16/09/26 02:07:04 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started

16/09/26 02:07:04 WARN api.NettyAvroRpcClient: Using default maxIOWorkers

16/09/26 02:07:05 INFO sink.AbstractRpcSink: Rpc sink k1 started.

 

 

Hadoop2 日志:

******

16/09/26 02:07:37 INFO sink.LoggerSink: Event: { headers:{} body: 4D 6F 6E 20 53 65 70 20 32 36 20 30 32 3A 30 37 Mon Sep 26 02:07 }

16/09/26 02:07:39 INFO sink.LoggerSink: Event: { headers:{} body: 4D 6F 6E 20 53 65 70 20 32 36 20 30 32 3A 30 37 Mon Sep 26 02:07 }

16/09/26 02:07:39 INFO sink.LoggerSink: Event: { headers:{} body: 4D 6F 6E 20 53 65 70 20 32 36 20 30 32 3A 30 37 Mon Sep 26 02:07 }

16/09/26 02:07:39 INFO sink.LoggerSink: Event: { headers:{} body: 4D 6F 6E 20 53 65 70 20 32 36 20 30 32 3A 30 37 Mon Sep 26 02:07 }

16/09/26 02:07:39 INFO sink.LoggerSink: Event: { headers:{} body: 4D 6F 6E 20 53 65 70 20 32 36 20 30 32 3A 30 37 Mon Sep 26 02:07 }

******

1-5)、多机测试实例(高可用配置)

  Hadoo1 做sources ,hadoop2于hadoop3做sinks的写入的HDFS数据

A)、配置

1-1)、Hadoop1 配置

   # 从tail命令获取数据发送到avro端口

# 另一个节点可配置一个avro源来中继数据,发送外部存储

 

[root@hadoop1 conf]#  vi  sources-hdfs.conf

#agent1 name

agent1.channels = c1

agent1.sources = r1

agent1.sinks = k1 k2

 

#set gruop

agent1.sinkgroups = g1

 

#set channel

agent1.channels.c1.type = memory

agent1.channels.c1.capacity = 1000

agent1.channels.c1.transactionCapacity = 100

 

agent1.sources.r1.channels = c1

agent1.sources.r1.type = exec

agent1.sources.r1.command = tail -F /root/log/test.log

 

# set  interceptors

agent1.sources.r1.interceptors = i1 i2

agent1.sources.r1.interceptors.i1.type = static

agent1.sources.r1.interceptors.i1.key = Type

agent1.sources.r1.interceptors.i1.value = LOGIN

agent1.sources.r1.interceptors.i2.type = timestamp

 

# set sink1

agent1.sinks.k1.channel = c1

agent1.sinks.k1.type = avro

agent1.sinks.k1.hostname = hadoop2

agent1.sinks.k1.port = 52020

 

# set sink2

agent1.sinks.k2.channel = c1

agent1.sinks.k2.type = avro

agent1.sinks.k2.hostname = hadoop3

agent1.sinks.k2.port = 52020

 

#set sink group

agent1.sinkgroups.g1.sinks = k1 k2

 

#set failover

agent1.sinkgroups.g1.processor.type = failover

agent1.sinkgroups.g1.processor.priority.k1 = 10

agent1.sinkgroups.g1.processor.priority.k2 = 1

agent1.sinkgroups.g1.processor.maxpenalty = 10000

 

 

 

创建文件:

[root@hadoop1 ~]# mkdir -p /root/log/

[root@hadoop1 log]# touch test.log

1-2)、Hadoop2 配制

# 采集配置文件,

[root@hadoop2 conf]#  vi sinks-hdfs.conf

 

#set Agent name

a1.sources = r1

a1.channels = c1

a1.sinks = k1

 

#set channel

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

# other node,nna to nns

a1.sources.r1.type = avro

a1.sources.r1.bind = hadoop2

a1.sources.r1.port = 52020

a1.sources.r1.interceptors = i1

a1.sources.r1.interceptors.i1.type = static

a1.sources.r1.interceptors.i1.key = Collector

a1.sources.r1.interceptors.i1.value = hadoop2

a1.sources.r1.channels = c1

 

#set sink to hdfs

a1.sinks.k1.type=hdfs

a1.sinks.k1.hdfs.path=/home/hdfs/flume/logdfs

a1.sinks.k1.hdfs.fileType=DataStream

a1.sinks.k1.hdfs.writeFormat=TEXT

a1.sinks.k1.hdfs.rollInterval=10

a1.sinks.k1.channel=c1

a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d

# 发送数据:

# $ bin/flume-ng avro-client -H localhost -p 4141 -F /usr/logs/log.10

1-3)、Hadoop3配置

# 采集配置文件,

[root@hadoop3 conf]#  vi sinks-hdfs.conf

 

#set Agent name

a1.sources = r1

a1.channels = c1

a1.sinks = k1

 

#set channel

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

# other node,nna to nns

a1.sources.r1.type = avro

a1.sources.r1.bind = hadoop3

a1.sources.r1.port = 52020

a1.sources.r1.interceptors = i1

a1.sources.r1.interceptors.i1.type = static

a1.sources.r1.interceptors.i1.key = Collector

a1.sources.r1.interceptors.i1.value = hadoop3

a1.sources.r1.channels = c1

 

#set sink to hdfs

a1.sinks.k1.type=hdfs

a1.sinks.k1.hdfs.path=/home/hdfs/flume/logdfs

a1.sinks.k1.hdfs.fileType=DataStream

a1.sinks.k1.hdfs.writeFormat=TEXT

a1.sinks.k1.hdfs.rollInterval=10

a1.sinks.k1.channel=c1

a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d

# 发送数据:

# $ bin/flume-ng avro-client -H localhost -p 4141 -F /usr/logs/log.10

 

B)、启动

1-1)、Hadoop3 日志

[root@hadoop3 agentConf]# flume-ng agent -c conf -f collector.properties   -n a1 -Dflume.root.logger=INFO,console

 

*******

16/09/26 02:55:55 INFO node.Application: Starting Channel c1

16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.

16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started

16/09/26 02:55:55 INFO node.Application: Starting Sink k1

16/09/26 02:55:55 INFO node.Application: Starting Source r1

16/09/26 02:55:55 INFO source.AvroSource: Starting Avro source r1: { bindAddress: hadoop3, port: 52020 }...

16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.

16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started

16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.

16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started

******

1-2)、Hadoop2日志

[root@hadoop2 agentConf]# flume-ng agent -c conf -f collector.properties   -n a1 -Dflume.root.logger=INFO,console

******

{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@687090a8 counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }

16/09/26 02:55:50 INFO node.Application: Starting Channel c1

16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.

16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started

16/09/26 02:55:50 INFO node.Application: Starting Sink k1

16/09/26 02:55:50 INFO node.Application: Starting Source r1

16/09/26 02:55:50 INFO source.AvroSource: Starting Avro source r1: { bindAddress: hadoop2, port: 52020 }...

16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.

16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started

16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.

16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started

*******

1-3)、Hadoop1日志

[root@hadoop1 agentConf]# flume-ng agent -c conf -f agent.properties  -n  agent1  -Dflume.root.logger=INFO,console

*******

16/09/26 02:56:03 INFO conf.FlumeConfiguration: Processing:k2

16/09/26 02:56:03 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent1]

16/09/26 02:56:03 INFO node.AbstractConfigurationProvider: Creating channels

16/09/26 02:56:03 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory

16/09/26 02:56:03 INFO node.AbstractConfigurationProvider: Created channel c1

16/09/26 02:56:03 INFO source.DefaultSourceFactory: Creating instance of source r1, type exec

16/09/26 02:56:03 INFO interceptor.StaticInterceptor: Creating StaticInterceptor: preserveExisting=true,key=Type,value=LOGIN

16/09/26 02:56:03 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: avro

16/09/26 02:56:03 INFO sink.AbstractRpcSink: Connection reset is set to 0. Will not reset connection to next hop

16/09/26 02:56:03 INFO sink.DefaultSinkFactory: Creating instance of sink: k2, type: avro

16/09/26 02:56:03 INFO sink.AbstractRpcSink: Connection reset is set to 0. Will not reset connection to next hop

16/09/26 02:56:03 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1, k2]

*********

C)、测试

1-1)、写入数据

[root@hadoop1 log]# while true; do date >> test.log ; sleep 1; done;

[root@hadoop1 log]# tail -f test.log

Mon Sep 26 02:57:13 PDT 2016

Mon Sep 26 02:57:14 PDT 2016

Mon Sep 26 02:57:15 PDT 2016

Mon Sep 26 02:57:16 PDT 2016

Mon Sep 26 02:57:17 PDT 2016

 

1-2)、Hadoop3显示

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 stopped

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.start.time == 1474883755212

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.stop.time == 1474883809719

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.batch.complete == 0

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.batch.empty == 9

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.batch.underflow == 0

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.connection.closed.count == 0

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.connection.creation.count == 0

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.connection.failed.count == 0

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.event.drain.attempt == 0

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.event.drain.sucess == 0

 

1-3)、Hadoop2显示

16/09/26 02:56:12 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false

16/09/26 02:56:13 INFO hdfs.BucketWriter: Creating /home/hdfs/flume/logdfs/2016-09-26.1474883772390.tmp

16/09/26 02:56:18 INFO hdfs.BucketWriter: Closing /home/hdfs/flume/logdfs/2016-09-26.1474883772390.tmp

16/09/26 02:56:18 INFO hdfs.BucketWriter: Renaming /home/hdfs/flume/logdfs/2016-09-26.1474883772390.tmp to /home/hdfs/flume/logdfs/2016-09-26.1474883772390

16/09/26 02:56:18 INFO hdfs.BucketWriter: Creating /home/hdfs/flume/logdfs/2016-09-26.1474883772391.tmp

16/09/26 02:56:26 INFO hdfs.BucketWriter: Closing /home/hdfs/flume/logdfs/2016-09-26.1474883772391.tmp

16/09/26 02:56:26 INFO hdfs.BucketWriter: Renaming /home/hdfs/flume/logdfs/2016-09-26.1474883772391.tmp to /home/hdfs/flume/logdfs/2016-09-26.1474883772391

16/09/26 02:56:26 INFO hdfs.BucketWriter: Creating /home/hdfs/flume/logdfs/2016-09-26.1474883772392.tmp

16/09/26 02:56:35 INFO hdfs.BucketWriter: Closing /home/hdfs/flume/logdfs/2016-09-26.1474883772392.tmp

 

 

1-4)、Hadoop1显示

16/09/26 02:56:04 INFO sink.AbstractRpcSink: Rpc sink k1 started.

16/09/26 02:56:04 INFO sink.AbstractRpcSink: Starting RpcSink k2 { host: hadoop3, port: 52020 }...

16/09/26 02:56:04 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k2: Successfully registered new MBean.

16/09/26 02:56:04 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k2 started

16/09/26 02:56:04 INFO sink.AbstractRpcSink: Rpc sink k2: Building RpcClient with hostname: hadoop3, port: 52020

16/09/26 02:56:04 INFO sink.AvroSink: Attempting to create Avro Rpc client.

16/09/26 02:56:04 WARN api.NettyAvroRpcClient: Using default maxIOWorkers

16/09/26 02:56:04 INFO sink.AbstractRpcSink: Rpc sink k2 started.

16/09/26 02:57:17 WARN sink.FailoverSinkProcessor: Sink k1 failed and has been sent to failover list

 

1-6)、配置详解

A)、Exec方式保存到Kafka

a1.sources = r1

a1.channels = c1

a1.sinks = k1

 

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /usr/local/flume/configurationFile/test.text

a1.sources.r1.channels = c1

 

a1.channels.c1.type=memory

a1.channels.c1.capacity=10000

a1.channels.c1.transactionCapacity=100

 

a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink

a1.sinks.k1.topic = kafkaTest

a1.sinks.k1.brokerList = hadoop1:9092

a1.sinks.k1.requiredAcks = 1

a1.sinks.k1.batchSize = 20

a1.sinks.k1.channel = c1

B)、Netcat 模式

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

 

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

 

# Describe the sink

a1.sinks.k1.type = logger

 

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

猜你喜欢

转载自blog.csdn.net/xfg0218/article/details/82343547