Flume 总结

官网：http://flume.apache.org/FlumeUserGuide.html

概述

Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统。

Flume可以采集文件，socket数据包等各种形式源数据，又可以将采集到的数据输出到HDFS、hbase、hive、kafka等众多外部存储系统中

一般的采集需求，通过对flume的简单配置即可实现

Flume针对特殊场景也具备良好的自定义扩展能力，因此，flume可以适用于大部分的日常数据采集场景

运行机制

1、Flume分布式系统中最核心的角色是agent，flume采集系统就是由一个个agent所连接起来形成

2、每一个agent相当于一个数据传递员，内部有三个组件：

a)Source：采集源，用于跟数据源对接，以获取数据

b)Sink：下沉地，采集数据的传送目的，用于往下一级agent传递数据或者往最终存储系统传递数据

c)Channel：angent内部的数据传输通道，用于从source将数据传递到sink

注：

Source 到 Channel 到 Sink之间传递数据的形式是Event事件；Event事件是一个数据流单元。

Source + sink + channel = Agent

单个Agent采集

多个Agent的实例

多路（Multiplexing）Agent

架构设计要点

Flume的架构主要有一下几个核心概念：

Event：一个数据单元，带有一个可选的消息头

Flow：Event从源点到达目的点的迁移的抽象

Client：操作位于源点处的Event，将其发送到Flume Agent

Agent：一个独立的Flume进程，包含组件Source、Channel、Sink

Source：用来消费传递到该组件的Event

Channel：中转Event的一个临时存储，保存有Source组件传递过来的Event

Sink：从Channel中读取并移除Event，将Event传递到Flow Pipeline中的下一个Agent（如果有的话）

Flume安装

安装包下载：

链接：http://pan.baidu.com/s/1miycTfU 密码：2ea4 如果无法下载请联系作者。

1-1）、安装

[root@hadoop1 /]# tar -zxvf apache-flume-1.6.0-bin.tar.gz

[root@hadoop1 /]# mv apache-flume-1.6.0-bin flume

[root@hadoop1 /]# cd /flume/conf/

[root@hadoop1 /]# mv flume-env.sh.template flume-env.sh

1-2）、修改配置文件

[root@hadoop1 /]# vi flume-env.sh

---- 修改JAVA_HOME的路径

1-3）、添加快捷方式

[root@hadoop1 /]# Vi /etc/profile

export FLUME_HOME=/usr/local/flume

[root@hadoop1 ~]# source /etc/profile

1-4）、常见的命令

[root@hadoop1 flume]# flume-ng -help

Error: Unknown or unspecified command '-help'

Usage: /usr/local/flume/bin/flume-ng <command> [options]...

commands:

help display this help text

agent run a Flume agent

avro-client run an avro Flume client

version show Flume version info

global options:

--conf,-c <conf> use configs in <conf> directory

--classpath,-C <cp> append to the classpath

--dryrun,-d do not actually start Flume, just print the command

--plugins-path <dirs> colon-separated list of plugins.d directories. See the

plugins.d section in the user guide for more details.

Default: $FLUME_HOME/plugins.d

-Dproperty=value sets a Java system property value

-Xproperty=value sets a Java -X option

agent options:

--name,-n <name> the name of this agent (required)

--conf-file,-f <file> specify a config file (required if -z missing)

--zkConnString,-z <str> specify the ZooKeeper connection to use (required if -f missing)

--zkBasePath,-p <path> specify the base path in ZooKeeper for agent configs

--no-reload-conf do not reload config file if changed

--help,-h display help text

avro-client options:

--rpcProps,-P <file> RPC client properties file with server connection params

--host,-H <host> hostname to which events will be sent

--port,-p <port> port of the avro source

--dirname <dir> directory to stream to avro source

--filename,-F <file> text file to stream to avro source (default: std input)

--headerFile,-R <file> File containing event headers as key/value pairs on each new line

--help,-h display help text

Either --rpcProps or both --host and --port must be specified.

Note that if <conf> directory is specified, then it is always included first

in the classpath.

1-5）、启动程序

A）、前段启动

flume-ng agent -c conf -f netcat-logger.conf -n a1 -Dflume.root.logger=INFO,console

-flume-ng agent run a Flume agent

-c conf 指定flume自身的配置文件所在目录

-f conf/netcat-logger.con 指定我们所描述的采集方案

-n a1 指定我们这个agent的名字

-Dflume.root.logger=INFO,console 显示日志打印的方式

B）、后端启动

flume-ng agent -c conf -f netcat-logger.conf -n a1

Flume 运行实例

1-1）、本地控制台案例

Flume支持众多的source和sink类型，详细手册可参考官方文档

http://flume.apache.org/FlumeUserGuide.html

A）、配置

可以现在conf的目录下新建文件nicate-logger.conf

[root@hadoop1 conf]# vi nicate-logger.conf

# 定义这个agent中各组件的名字

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# 描述和配置source组件：r1

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

# 描述和配置sink组件：k1

a1.sinks.k1.type = logger

# 描述和配置channel组件，此处使用是内存缓存的方式

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# 描述和配置source channel sink之间的连接关系

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

B）、启动flume-ng

[root@hadoop1 conf]# flume-ng agent -c conf -f /usr/local/flume/conf/nicate-logger.conf -n a1 -Dflume.root.logger=INFO,console

Info: Including Hadoop libraries found via (/usr/local/hadoop-2.6.4/bin/hadoop) for HDFS access

Info: Excluding /usr/local/hadoop-2.6.4/share/hadoop/common/lib/slf4j-api-1.7.5.jar from classpath

Info: Excluding /usr/local/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar from classpath

Info: Including Hive libraries found via () for Hive access

+ exec /home/jdk1.7/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp 'conf:/usr/local/flume/lib/*:/usr/local/hadoop-2.6.4/etc/hadoop:/usr/local/hadoop-2.6.4/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop-2.6.4/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop-2.6.4/share/hadoop/common/lib/apacheds-kerberos...................省略其他的加载配置............usr/local/hadoop-2.6.4/share/hadoop/mapreduce/lib:/usr/local/hadoop-2.6.4/share/hadoop/mapreduce/lib-examples:/usr/local/hadoop-2.6.4/share/hadoop/mapreduce/sources:/usr/local/hadoop-2.6.4/contrib/capacity-scheduler/*.jar:/lib/*' -Djava.library.path=:/usr/local/hadoop-2.6.4/lib/native org.apache.flume.node.Application -f /usr/local/flume/conf/nicate-logger.conf -n a1

16/09/25 10:53:34 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting

16/09/25 10:53:34 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/usr/local/flume/conf/nicate-logger.conf

16/09/25 10:53:34 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1

16/09/25 10:53:34 INFO conf.FlumeConfiguration: Processing:k1

16/09/25 10:53:34 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [a1]

16/09/25 10:53:34 INFO node.AbstractConfigurationProvider: Creating channels

16/09/25 10:53:34 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory

16/09/25 10:53:34 INFO node.AbstractConfigurationProvider: Created channel c1

16/09/25 10:53:34 INFO source.DefaultSourceFactory: Creating instance of source r1, type netcat

16/09/25 10:53:34 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: logger

16/09/25 10:53:34 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1]

16/09/25 10:53:34 INFO node.Application: Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@6c62aa33 counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }

16/09/25 10:53:34 INFO node.Application: Starting Channel c1

16/09/25 10:53:34 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.

16/09/25 10:53:34 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started

16/09/25 10:53:34 INFO node.Application: Starting Sink k1

16/09/25 10:53:34 INFO node.Application: Starting Source r1

16/09/25 10:53:34 INFO source.NetcatSource: Source starting

16/09/25 10:53:35 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]

之所以特出来启动信息，为了详细的显示了加载配置的信息。

C）、使用telnet 测试

安装telnet ：

[root@hadoop1 flume]# yum list | grep telnet

telnet.x86_64 1:0.17-48.el6 base

telnet-server.x86_64 1:0.17-48.el6 base

[root@hadoop1 flume]# yum install telnet.x86_64

测试：

[root@hadoop1 ~]# telnet 127.0.0.1 44444

Trying 127.0.0.1...

Connected to 127.0.0.1.

Escape character is '^]'.

xiaozhang

xiaowang

服务器端打印的日志信息：

16/09/25 10:53:34 INFO node.Application: Starting Sink k1

16/09/25 10:53:34 INFO node.Application: Starting Source r1

16/09/25 10:53:34 INFO source.NetcatSource: Source starting

16/09/25 10:53:35 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]

16/09/25 11:16:29 INFO sink.LoggerSink: Event: { headers:{} body: 78 69 61 6F 7A 68 61 6E 67 0D xiaozhang. }

16/09/25 11:16:31 INFO sink.LoggerSink: Event: { headers:{} body: 78 69 61 6F 77 61 6E 67 0D xiaowang. }

注意空格问题：

在配置时要有空格如：

a1.sources = r1

连接时：[root@hadoop2 ~]# telnet 127.0.0.1 44444 也要有空格

在打印日志是显示headers / body (分为编码与内容)

1-2）、本地单机HDFS测试案例

A）、配置

[root@hadoop1 conf]# vi tail-hdfs.conf

# tail-hdfs.conf

# 用tail命令获取数据，下沉到hdfs

# 启动命令：

# bin/flume-ng agent -c conf -f conf/tail-hdfs.conf -n a1

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /root/logs/test.log

a1.sources.r1.channels = c1

# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.channel = c1

a1.sinks.k1.hdfs.path = /flume/tailout/%y-%m-%d/%H%M/

a1.sinks.k1.hdfs.filePrefix = events-

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 10

a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.rollInterval = 3

a1.sinks.k1.hdfs.rollSize = 20

a1.sinks.k1.hdfs.rollCount = 5

a1.sinks.k1.hdfs.batchSize = 1

a1.sinks.k1.hdfs.useLocalTimeStamp = true

#生成的文件类型，默认是Sequencefile，可用DataStream，则为普通文本

a1.sinks.k1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

B）、测试

[root@hadoop1 conf]# flume-ng agent -c conf -f /use/local/flume/conf/tail-hdfs.conf -n agnet1 -Dflume.root.logger=INFO,console

16/07/30 20:29:28 INFO node.Application: Starting Channel c1

16/07/30 20:29:28 INFO node.Application: Waiting for channel: c1 to start. Sleeping for 500 ms

16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.

16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started

16/07/30 20:29:28 INFO node.Application: Starting Sink k1

16/07/30 20:29:28 INFO node.Application: Starting Source r1

16/07/30 20:29:28 INFO source.ExecSource: Exec source starting with command:tail -F /home/flumeLog/test.log

16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.

16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started

16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.

16/07/30 20:29:28 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started

16/07/30 20:29:32 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false

16/07/30 20:29:33 INFO hdfs.BucketWriter: Creating /flume/tailout/16-07-30/2020//events-.1469881772754.tmp

16/07/30 20:29:38 WARN hdfs.BucketWriter: Block Under-replication detected. Rotating file.

16/07/30 20:29:38 INFO hdfs.BucketWriter: Closing /flume/tailout/16-07-30/2020//events-.1469881772754.tmp

16/07/30 20:29:38 INFO hdfs.BucketWriter: Renaming /flume/tailout/16-07-30/2020/events-.1469881772754.tmp to /flume/tailout/16-07-30/2020/events-.1469881772754

16/07/30 20:29:38 INFO hdfs.BucketWriter: Creating /flume/tailout/16-07-30/2020//events-.1469881772755.tmp

16/07/30 20:50:22 INFO hdfs.BucketWriter: Closing /flume/tailout/16-07-30/2050//events-.1469883004544.tmp

16/07/30 20:50:23 INFO hdfs.BucketWriter: Renaming /flume/tailout/16-07-30/2050/events-.1469883004544.tmp to /flume/tailout/16-07-30/2050/events-.1469883004544

16/07/30 20:50:25 INFO hdfs.BucketWriter: Creating /flume/tailout/16-07-30/2050//events-.1469883004545.tmp

查看HDFS的动态信息：

[root@hadoop1 ~]# hadoop fs -ls -R /flume/tailout/16-07-30/

-rw-r--r-- 3 root supergroup 0 2016-07-30 20:50 /flume/tailout/16-07-30/2050/events-.1469883004544.tmp

HDFS 文件信息：

[root@hadoop1~]# hadoop fs -cat /flume/tailout/16-07-30/2020/events-.1469881772754

Sat Jul 30 20:23:09 CST 2016

[root@hadoop1 ~]# hadoop fs -cat /flume/tailout/16-07-30/2030/events-.1469882131920

Sat Jul 30 20:35:45 CST 2016

注意标红的信息：

1 、观察日志会发现同时会处理三条信息，神不知道，哪位大神指导，求指教（我靠发现是先往文件中写入数据，再上传到HDFS上只不过时间太短了，有做好了下一个写入的准备，下一个的写入准备正好是上一个的准备的事件）。

2、2050 会截取上一个整数的事件点，来获取数据

往HDFS写入数据需要在运行在HDFS环境中运行，也就是说只要运行在HDFS红就可以啦。多方便啊!!!!!!

1-3）、采集文件目录

A）、配置

[root@hadoop1 ~]# cd /usr/local/flume/

[root@hadoop1 flume]# mkdir testData

[root@hadoop1 conf]# vi flume_dir.conf

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = spooldir

a1.sources.r1.spoolDir = /usr/local/flume/testData

a1.sources.r1.fileHeader = true

# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.channel = c1

a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/

a1.sinks.k1.hdfs.filePrefix = events-

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 10

a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.rollInterval = 3

# 可以调文件的大小

a1.sinks.k1.hdfs.rollSize = 20

a1.sinks.k1.hdfs.rollCount = 5

a1.sinks.k1.hdfs.batchSize = 1

a1.sinks.k1.hdfs.useLocalTimeStamp = true

a1.sinks.k1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

[root@hadoop1 flume]# vi a.text

[root@hadoop1 flume]# vi b.text

[root@hadoop1 flume]# mv a.text b.text testData/

[root@hadoop1 flume]# cd testData/

[root@hadoop1 testData]# ls

a.text.COMPLETED b.text.COMPLETED

注意：不能再监控的目录下创建目录，如果创建目录会不采集，也不能放重复的文件，如果重复Flume会停止运行，目的为不重复采集，后缀为COMPLETED则表示已经采集

B）、启动

[root@hadoop1 conf]# flume-ng agent -c conf -f ../conf/flume_dir.conf -n a1 -Dflume.root.logger=INFO,console

16/09/26 00:28:47 INFO hdfs.BucketWriter: Renaming /flume/events/16-09-26/0020/events-.1474874885229.tmp to /flume/events/16-09-26/0020/events-.1474874885229

16/09/26 00:28:47 INFO hdfs.BucketWriter: Creating /flume/events/16-09-26/0020//events-.1474874885230.tmp

16/09/26 00:28:47 INFO hdfs.BucketWriter: Closing /flume/events/16-09-26/0020//events-.1474874885230.tmp

16/09/26 00:28:47 INFO hdfs.BucketWriter: Renaming /flume/events/16-09-26/0020/events-.1474874885230.tmp to /flume/events/16-09-26/0020/events-.1474874885230

16/09/26 00:28:47 INFO hdfs.BucketWriter: Creating /flume/events/16-09-26/0020//events-.1474874885231.tmp

16/09/26 00:28:49 INFO hdfs.BucketWriter: Closing /flume/events/16-09-26/0020//events-.1474874885231.tmp

16/09/26 00:28:49 INFO hdfs.BucketWriter: Renaming /flume/events/16-09-26/0020/events-.1474874885231.tmp to /flume/events/16-09-26/0020/events-.1474874885231

16/09/26 00:28:49 INFO hdfs.BucketWriter: Creating /flume/events/16-09-26/0020//events-.1474874885232.tmp

16/09/26 00:28:52 INFO hdfs.BucketWriter: Closing /flume/events/16-09-26/0020//events-.1474874885232.tmp

16/09/26 00:28:52 INFO hdfs.BucketWriter: Renaming /flume/events/16-09-26/0020/events-.1474874885232.tmp to /flume/events/16-09-26/0020/events-.1474874885232

16/09/26 00:28:52 INFO hdfs.HDFSEventSink: Writer callback called.

C）、查看效果

[root@hadoop1 testData]# hadoop fs -ls /flume/events/16-09-26/0020

Found 4 items

-rw-r--r-- 3 root supergroup 18 2016-09-26 00:28 /flume/events/16-09-26/0020/events-.1474874885229

-rw-r--r-- 3 root supergroup 12 2016-09-26 00:28 /flume/events/16-09-26/0020/events-.1474874885230

-rw-r--r-- 3 root supergroup 13 2016-09-26 00:28 /flume/events/16-09-26/0020/events-.1474874885231

-rw-r--r-- 3 root supergroup 11 2016-09-26 00:28 /flume/events/16-09-26/0020/events-.1474874885232

1-4）、两个机器连接

Hadoop1 当source ，hadoop2当sink(控制台显示)

A）、hadoop1配置

[root@hadoop1 flume]# mkdir agentConf

[root@hadoop1 flume]# vi tail-avro.properties

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /root/log/test.log

a1.sources.r1.channels = c1

# Describe the sink

##sink端的avro是一个数据发送者

a1.sinks = k1

a1.sinks.k1.type = avro

a1.sinks.k1.channel = c1

a1.sinks.k1.hostname = hadoop2

a1.sinks.k1.port = 8888

a1.sinks.k1.batch-size = 2

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

B）、hadoop2配置

[root@hadoop2 flume]# mkdir agentConf

[root@hadoop2 flume]# vi avro-hdfs.properties

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = avro

a1.sources.r1.channels = c1

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 8888

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

C）、测试

先启动hadoop2:

flume-ng agent -c conf -f avro-hdfs.conf -n a1 -Dflume.root.logger=INFO,console

再启动hadoop1:

flume-ng agent -c conf -f agentConf/tail-avro.properties -n a1 -Dflume.root.logger=INFO,console

写入数据：

[root@hadoop1 log]# while true; do date >> test.log ; sleep 1; done;

[root@hadoop1 log]# tail -f test.log

Mon Sep 26 02:07:29 PDT 2016

Mon Sep 26 02:07:30 PDT 2016

Mon Sep 26 02:07:31 PDT 2016

Mon Sep 26 02:07:32 PDT 2016

Mon Sep 26 02:07:33 PDT 2016

Mon Sep 26 02:07:34 PDT 2016

Mon Sep 26 02:07:35 PDT 2016

Mon Sep 26 02:07:36 PDT 2016

Hadoop1日志：

*******

16/09/26 02:07:04 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started

16/09/26 02:07:04 INFO sink.AbstractRpcSink: Rpc sink k1: Building RpcClient with hostname: hadoop2, port: 4141

16/09/26 02:07:04 INFO sink.AvroSink: Attempting to create Avro Rpc client.

16/09/26 02:07:04 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.

16/09/26 02:07:04 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started

16/09/26 02:07:04 WARN api.NettyAvroRpcClient: Using default maxIOWorkers

16/09/26 02:07:05 INFO sink.AbstractRpcSink: Rpc sink k1 started.

Hadoop2 日志：

******

16/09/26 02:07:37 INFO sink.LoggerSink: Event: { headers:{} body: 4D 6F 6E 20 53 65 70 20 32 36 20 30 32 3A 30 37 Mon Sep 26 02:07 }

16/09/26 02:07:39 INFO sink.LoggerSink: Event: { headers:{} body: 4D 6F 6E 20 53 65 70 20 32 36 20 30 32 3A 30 37 Mon Sep 26 02:07 }

******

1-5）、多机测试实例（高可用配置）

Hadoo1 做sources ，hadoop2于hadoop3做sinks的写入的HDFS数据

A）、配置

1-1）、Hadoop1 配置

# 从tail命令获取数据发送到avro端口

# 另一个节点可配置一个avro源来中继数据，发送外部存储

[root@hadoop1 conf]# vi sources-hdfs.conf

#agent1 name

agent1.channels = c1

agent1.sources = r1

agent1.sinks = k1 k2

#set gruop

agent1.sinkgroups = g1

#set channel

agent1.channels.c1.type = memory

agent1.channels.c1.capacity = 1000

agent1.channels.c1.transactionCapacity = 100

agent1.sources.r1.channels = c1

agent1.sources.r1.type = exec

agent1.sources.r1.command = tail -F /root/log/test.log

# set interceptors

agent1.sources.r1.interceptors = i1 i2

agent1.sources.r1.interceptors.i1.type = static

agent1.sources.r1.interceptors.i1.key = Type

agent1.sources.r1.interceptors.i1.value = LOGIN

agent1.sources.r1.interceptors.i2.type = timestamp

# set sink1

agent1.sinks.k1.channel = c1

agent1.sinks.k1.type = avro

agent1.sinks.k1.hostname = hadoop2

agent1.sinks.k1.port = 52020

# set sink2

agent1.sinks.k2.channel = c1

agent1.sinks.k2.type = avro

agent1.sinks.k2.hostname = hadoop3

agent1.sinks.k2.port = 52020

#set sink group

agent1.sinkgroups.g1.sinks = k1 k2

#set failover

agent1.sinkgroups.g1.processor.type = failover

agent1.sinkgroups.g1.processor.priority.k1 = 10

agent1.sinkgroups.g1.processor.priority.k2 = 1

agent1.sinkgroups.g1.processor.maxpenalty = 10000

创建文件：

[root@hadoop1 ~]# mkdir -p /root/log/

[root@hadoop1 log]# touch test.log

1-2）、Hadoop2 配制

# 采集配置文件，

[root@hadoop2 conf]# vi sinks-hdfs.conf

#set Agent name

a1.sources = r1

a1.channels = c1

a1.sinks = k1

#set channel

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# other node,nna to nns

a1.sources.r1.type = avro

a1.sources.r1.bind = hadoop2

a1.sources.r1.port = 52020

a1.sources.r1.interceptors = i1

a1.sources.r1.interceptors.i1.type = static

a1.sources.r1.interceptors.i1.key = Collector

a1.sources.r1.interceptors.i1.value = hadoop2

a1.sources.r1.channels = c1

#set sink to hdfs

a1.sinks.k1.type=hdfs

a1.sinks.k1.hdfs.path=/home/hdfs/flume/logdfs

a1.sinks.k1.hdfs.fileType=DataStream

a1.sinks.k1.hdfs.writeFormat=TEXT

a1.sinks.k1.hdfs.rollInterval=10

a1.sinks.k1.channel=c1

a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d

# 发送数据：

# $ bin/flume-ng avro-client -H localhost -p 4141 -F /usr/logs/log.10

1-3）、Hadoop3配置

# 采集配置文件，

[root@hadoop3 conf]# vi sinks-hdfs.conf

#set Agent name

a1.sources = r1

a1.channels = c1

a1.sinks = k1

#set channel

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# other node,nna to nns

a1.sources.r1.type = avro

a1.sources.r1.bind = hadoop3

a1.sources.r1.port = 52020

a1.sources.r1.interceptors = i1

a1.sources.r1.interceptors.i1.type = static

a1.sources.r1.interceptors.i1.key = Collector

a1.sources.r1.interceptors.i1.value = hadoop3

a1.sources.r1.channels = c1

#set sink to hdfs

a1.sinks.k1.type=hdfs

a1.sinks.k1.hdfs.path=/home/hdfs/flume/logdfs

a1.sinks.k1.hdfs.fileType=DataStream

a1.sinks.k1.hdfs.writeFormat=TEXT

a1.sinks.k1.hdfs.rollInterval=10

a1.sinks.k1.channel=c1

a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d

# 发送数据：

# $ bin/flume-ng avro-client -H localhost -p 4141 -F /usr/logs/log.10

B）、启动

1-1）、Hadoop3 日志

[root@hadoop3 agentConf]# flume-ng agent -c conf -f collector.properties -n a1 -Dflume.root.logger=INFO,console

*******

16/09/26 02:55:55 INFO node.Application: Starting Channel c1

16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.

16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started

16/09/26 02:55:55 INFO node.Application: Starting Sink k1

16/09/26 02:55:55 INFO node.Application: Starting Source r1

16/09/26 02:55:55 INFO source.AvroSource: Starting Avro source r1: { bindAddress: hadoop3, port: 52020 }...

16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.

16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started

16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.

16/09/26 02:55:55 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started

******

1-2）、Hadoop2日志

[root@hadoop2 agentConf]# flume-ng agent -c conf -f collector.properties -n a1 -Dflume.root.logger=INFO,console

******

{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@687090a8 counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }

16/09/26 02:55:50 INFO node.Application: Starting Channel c1

16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.

16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started

16/09/26 02:55:50 INFO node.Application: Starting Sink k1

16/09/26 02:55:50 INFO node.Application: Starting Source r1

16/09/26 02:55:50 INFO source.AvroSource: Starting Avro source r1: { bindAddress: hadoop2, port: 52020 }...

16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.

16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started

16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.

16/09/26 02:55:50 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started

*******

1-3）、Hadoop1日志

[root@hadoop1 agentConf]# flume-ng agent -c conf -f agent.properties -n agent1 -Dflume.root.logger=INFO,console

*******

16/09/26 02:56:03 INFO conf.FlumeConfiguration: Processing:k2

16/09/26 02:56:03 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent1]

16/09/26 02:56:03 INFO node.AbstractConfigurationProvider: Creating channels

16/09/26 02:56:03 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory

16/09/26 02:56:03 INFO node.AbstractConfigurationProvider: Created channel c1

16/09/26 02:56:03 INFO source.DefaultSourceFactory: Creating instance of source r1, type exec

16/09/26 02:56:03 INFO interceptor.StaticInterceptor: Creating StaticInterceptor: preserveExisting=true,key=Type,value=LOGIN

16/09/26 02:56:03 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: avro

16/09/26 02:56:03 INFO sink.AbstractRpcSink: Connection reset is set to 0. Will not reset connection to next hop

16/09/26 02:56:03 INFO sink.DefaultSinkFactory: Creating instance of sink: k2, type: avro

16/09/26 02:56:03 INFO sink.AbstractRpcSink: Connection reset is set to 0. Will not reset connection to next hop

16/09/26 02:56:03 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1, k2]

*********

C）、测试

1-1）、写入数据

[root@hadoop1 log]# while true; do date >> test.log ; sleep 1; done;

[root@hadoop1 log]# tail -f test.log

Mon Sep 26 02:57:13 PDT 2016

Mon Sep 26 02:57:14 PDT 2016

Mon Sep 26 02:57:15 PDT 2016

Mon Sep 26 02:57:16 PDT 2016

Mon Sep 26 02:57:17 PDT 2016

1-2）、Hadoop3显示

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 stopped

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.start.time == 1474883755212

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.stop.time == 1474883809719

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.batch.complete == 0

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.batch.empty == 9

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.batch.underflow == 0

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.connection.closed.count == 0

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.connection.creation.count == 0

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.connection.failed.count == 0

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.event.drain.attempt == 0

16/09/26 02:56:49 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.event.drain.sucess == 0

1-3）、Hadoop2显示

16/09/26 02:56:12 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false

16/09/26 02:56:13 INFO hdfs.BucketWriter: Creating /home/hdfs/flume/logdfs/2016-09-26.1474883772390.tmp

16/09/26 02:56:18 INFO hdfs.BucketWriter: Closing /home/hdfs/flume/logdfs/2016-09-26.1474883772390.tmp

16/09/26 02:56:18 INFO hdfs.BucketWriter: Renaming /home/hdfs/flume/logdfs/2016-09-26.1474883772390.tmp to /home/hdfs/flume/logdfs/2016-09-26.1474883772390

16/09/26 02:56:18 INFO hdfs.BucketWriter: Creating /home/hdfs/flume/logdfs/2016-09-26.1474883772391.tmp

16/09/26 02:56:26 INFO hdfs.BucketWriter: Closing /home/hdfs/flume/logdfs/2016-09-26.1474883772391.tmp

16/09/26 02:56:26 INFO hdfs.BucketWriter: Renaming /home/hdfs/flume/logdfs/2016-09-26.1474883772391.tmp to /home/hdfs/flume/logdfs/2016-09-26.1474883772391

16/09/26 02:56:26 INFO hdfs.BucketWriter: Creating /home/hdfs/flume/logdfs/2016-09-26.1474883772392.tmp

16/09/26 02:56:35 INFO hdfs.BucketWriter: Closing /home/hdfs/flume/logdfs/2016-09-26.1474883772392.tmp

1-4）、Hadoop1显示

16/09/26 02:56:04 INFO sink.AbstractRpcSink: Rpc sink k1 started.

16/09/26 02:56:04 INFO sink.AbstractRpcSink: Starting RpcSink k2 { host: hadoop3, port: 52020 }...

16/09/26 02:56:04 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k2: Successfully registered new MBean.

16/09/26 02:56:04 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k2 started

16/09/26 02:56:04 INFO sink.AbstractRpcSink: Rpc sink k2: Building RpcClient with hostname: hadoop3, port: 52020

16/09/26 02:56:04 INFO sink.AvroSink: Attempting to create Avro Rpc client.

16/09/26 02:56:04 WARN api.NettyAvroRpcClient: Using default maxIOWorkers

16/09/26 02:56:04 INFO sink.AbstractRpcSink: Rpc sink k2 started.

16/09/26 02:57:17 WARN sink.FailoverSinkProcessor: Sink k1 failed and has been sent to failover list

1-6）、配置详解

A）、Exec方式保存到Kafka

a1.sources = r1

a1.channels = c1

a1.sinks = k1

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /usr/local/flume/configurationFile/test.text

a1.sources.r1.channels = c1

a1.channels.c1.type=memory

a1.channels.c1.capacity=10000

a1.channels.c1.transactionCapacity=100

a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink

a1.sinks.k1.topic = kafkaTest

a1.sinks.k1.brokerList = hadoop1:9092

a1.sinks.k1.requiredAcks = 1

a1.sinks.k1.batchSize = 20

a1.sinks.k1.channel = c1

B）、Netcat 模式

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

快学Big Data -- Flume（十五）

Flume 总结

概述

运行机制

架构设计要点

Flume安装

1-1）、安装

1-2）、修改配置文件

1-3）、添加快捷方式

1-4）、常见的命令

1-5）、启动程序

Flume 运行实例

1-1）、本地控制台案例

1-2）、本地单机HDFS测试案例

1-3）、采集文件目录

1-4）、两个机器连接

1-5）、多机测试实例（高可用配置）

1-6）、配置详解

猜你喜欢