Hadoop————flume强化

1、flume的特点
收集、移动、聚合大量日志数据的服务。
基于流数据的架构，用于在线日志分析。
基于事件。在生产和消费者之间启动协调作用。提供了事务保证，确保消息一定被分发。Source 多种、Sink多种。可以有多级跃点。
这里写图片描述

Source，接受数据，类型有多种。
Channel，临时存放地，对Source中来的数据进行缓冲，直到sink消费掉。
Sink，从channel提取数据存放到中央化存储(hadoop / hbase)。

2、安装配置flume

1.下载
2.tar
3.环境变量
4.验证flume是否成功

$>flume-ng version         //next generation.下一代.

配置flume

1.创建配置文件

[/soft/flume/conf/hello.conf]
#声明三种组件
a1.sources = r1
a1.channels = c1
a1.sinks = k1

#定义source信息
a1.sources.r1.type=netcat
a1.sources.r1.bind=localhost
a1.sources.r1.port=8888

#定义sink信息
a1.sinks.k1.type=logger

#定义channel信息
a1.channels.c1.type=memory

#绑定在一起
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1

2.运行

a)启动flume agent
bin/$> ./flume-ng agent -f ../conf/helloworld.conf -n a1 -Dflume.root.logger=INFO,console

b)启动nc的客户端
$>nc localhost 8888
$nc>hello world

c)在flume的终端输出hello world.

安装nc

$>sudo yum install nmap-ncat.x86_64

清除仓库缓存

$>修改ali.repo --> ali.repo.bak文件。
$>sudo yum clean all
$>sudo yum makecache

#例如阿里基本源 
$>sudo wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo 

#阿里epel源
$>sudo wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo

3、flume source

3.1 netcat

[/soft/flume/conf/hello.conf]
#声明三种组件
a1.sources = r1
a1.channels = c1
a1.sinks = k1

#定义source信息
a1.sources.r1.type=netcat
a1.sources.r1.bind=localhost
a1.sources.r1.port=8888

#定义sink信息
a1.sinks.k1.type=logger

#定义channel信息
a1.channels.c1.type=memory

#绑定在一起
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1

3.2 exec

实时日志收集。

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type=exec
a1.sources.r1.command=tail -F /home/centos/test.txt

a1.sinks.k1.type=logger

a1.channels.c1.type=memory

a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1

3.3 批量收集日志
监控一个文件夹，静态文件。
收集完之后，会重命名文件成新文件。.compeleted.

a)配置文件

[spooldir_r.conf]
a1.sources = r1
a1.channels = c1
a1.sinks = k1

a1.sources.r1.type=spooldir
a1.sources.r1.spoolDir=/home/centos/spool
a1.sources.r1.fileHeader=true

a1.sinks.k1.type=logger

a1.channels.c1.type=memory

a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1

b)创建目录

$>mkdir ~/spool

c)启动flume

$>bin/flume-ng agent -f ../conf/spooldir_r.conf -n a1 -Dflume.root.logger=INFO,console

3.4 序列source

[seq.conf]
a1.sources = r1
a1.channels = c1
a1.sinks = k1

a1.sources.r1.type=seq
a1.sources.r1.totalEvents=1000

a1.sinks.k1.type=logger

a1.channels.c1.type=memory

a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1

[运行]

$>bin/flume-ng agent -f ../conf/helloworld.conf -n a1 -Dflume.root.logger=INFO,console

3.5 StressSource

a1.sources = stresssource-1
a1.channels = memoryChannel-1
a1.sources.stresssource-1.type = org.apache.flume.source.StressSource
a1.sources.stresssource-1.size = 10240
a1.sources.stresssource-1.maxTotalEvents = 1000000
a1.sources.stresssource-1.channels = memoryChannel-1

4、flume sink

4.1 hdfs

        a1.sources = r1
        a1.channels = c1
        a1.sinks = k1

        a1.sources.r1.type = netcat
        a1.sources.r1.bind = localhost
        a1.sources.r1.port = 8888

        a1.sinks.k1.type = hdfs
        a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H/%M/%S
        a1.sinks.k1.hdfs.filePrefix = events-

        #是否是产生新目录,每十分钟产生一个新目录,一般控制的目录方面。
        #2017-12-12 -->
        #2017-12-12 -->%H%M%S

        a1.sinks.k1.hdfs.round = true           
        a1.sinks.k1.hdfs.roundValue = 10
        a1.sinks.k1.hdfs.roundUnit = second

        a1.sinks.k1.hdfs.useLocalTimeStamp=true

        #是否产生新文件。
        a1.sinks.k1.hdfs.rollInterval=10
        a1.sinks.k1.hdfs.rollSize=10
        a1.sinks.k1.hdfs.rollCount=3

        a1.channels.c1.type=memory

        a1.sources.r1.channels = c1
        a1.sinks.k1.channel = c1

4.2 hive
略

4.3 hbase

        a1.sources = r1
        a1.channels = c1
        a1.sinks = k1

        a1.sources.r1.type = netcat
        a1.sources.r1.bind = localhost
        a1.sources.r1.port = 8888

        a1.sinks.k1.type = hbase
        a1.sinks.k1.table = ns1:t12
        a1.sinks.k1.columnFamily = f1
        a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer

        a1.channels.c1.type=memory

        a1.sources.r1.channels = c1
        a1.sinks.k1.channel = c1

这里写图片描述
4.4 kafka
略
5、使用avroSource和AvroSink实现跃点agent处理

1.创建配置文件

    [avro_hop.conf]
        #a1
        a1.sources = r1
        a1.sinks= k1
        a1.channels = c1

        a1.sources.r1.type=netcat
        a1.sources.r1.bind=localhost
        a1.sources.r1.port=8888

        a1.sinks.k1.type = avro
        a1.sinks.k1.hostname=localhost
        a1.sinks.k1.port=9999

        a1.channels.c1.type=memory

        a1.sources.r1.channels = c1
        a1.sinks.k1.channel = c1

        #a2
        a2.sources = r2
        a2.sinks= k2
        a2.channels = c2

        a2.sources.r2.type=avro
        a2.sources.r2.bind=localhost
        a2.sources.r2.port=9999

        a2.sinks.k2.type = logger

        a2.channels.c2.type=memory

        a2.sources.r2.channels = c2
        a2.sinks.k2.channel = c2

2.启动a2

$>flume-ng agent -f /soft/flume/conf/avro_hop.conf -n a2 -Dflume.root.logger=INFO,console

3.验证a2

$>netstat -anop | grep 9999

4.启动a1

$>flume-ng agent -f /soft/flume/conf/avro_hop.conf -n a1

5.验证a1

$>netstat -anop | grep 8888

6、channel

1.MemoryChannel
上面的都是MemoryChannel,所以在这里略过。
2.FileChannel

a1.sources = r1
a1.sinks= k1
a1.channels = c1

a1.sources.r1.type=netcat
a1.sources.r1.bind=localhost
a1.sources.r1.port=8888

a1.sinks.k1.type=logger

a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /home/centos/flume/fc_check
a1.channels.c1.dataDirs = /home/centos/flume/fc_data

a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1

可溢出文件通道

a1.channels = c1
a1.channels.c1.type = SPILLABLEMEMORY
#0表示禁用内存通道，等价于文件通道
a1.channels.c1.memoryCapacity = 0
#1,禁用文件通道，等价内存通道。
a1.channels.c1.overflowCapacity = 2000

a1.channels.c1.byteCapacity = 800000
a1.channels.c1.checkpointDir = /user/centos/flume/fc_check
a1.channels.c1.dataDirs = /user/centos/flume/fc_data

Hadoop————flume强化

猜你喜欢