大数据之flume(一) --- Flume介绍,Source、Channel、Sink,安装flume,配置和使用flume

一、Flume介绍
----------------------------------------------------------
    1.是一种收集,移动,聚合大量日志数据的服务
    2.基于流数据的架构,用于在线日志分析处理
    3.在生产和消费者之间,起到有个缓冲的作用
    4.提供了实务保证,确保消息一定被分发和处理
    5.输入数据源多样,输出数据格式多元
    6.多级跃点 -- 一个flume的输出可以作为另一个flume的输入


二、Source、Channel、Sink
----------------------------------------------------------
    1.Source
        接收数据的源头,数据类型可以有多重

    2.Channel
        临时存储接收的数据,并对数据进行缓冲,直到sink消费掉

    3.Sink
        从channel中提取数据,并将数据进行集中储存(hbase,hdfs)


三、安装flume
-----------------------------------------------------------
    1.下载apache-flume-1.7.0-bin.tar.gz

    2.tar开

    3.创建符号链接

    4.配置环境变量
        FLUME_HOME="/soft/flume"
        PATH=......:/soft/flume/bin;

    5.验证安装是否成功
        $> flume-ng  version


四、配置和使用flume
-----------------------------------------------------------------
    1.创建配置文件[/soft/flume/conf/hello.conf]
        #声明三种组件
        a1.sources = r1
        a1.channels = c1
        a1.sinks = k1

        #定义source信息
        a1.sources.r1.type=netcat
        a1.sources.r1.bind=localhost
        a1.sources.r1.port=8888

        #定义sink信息
        a1.sinks.k1.type=logger

        #定义channel信息
        a1.channels.c1.type=memory

        #绑定在一起
        a1.sources.r1.channels=c1
        a1.sinks.k1.channel=c1

    2.运行 -- nc 作为数据源source
        a)启动flume agent
            $flume/bin> ./flume-ng agent -f ../conf/helloworld.conf -n a1 -Dflume.root.logger=INFO,console
        b)启动nc的客户端
            $>nc localhost 8888
            $nc>hello world

        c)在flume的终端输出hello world.


五、flume sources数据源
----------------------------------------------------------------
    1.netcat

    2.exec 实时日志收集,监控日志文件内容的增加
       a)配置文件
            [/soft/flume/conf/exec.conf]
            a1.sources = r1
            a1.sinks = k1
            a1.channels = c1

            a1.sources.r1.type=exec
            a1.sources.r1.command=tail -F /home/ubuntu/test.log

            a1.sinks.k1.type=logger

            a1.channels.c1.type=memory

            a1.sources.r1.channels=c1
            a1.sinks.k1.channel=c1

        b)$>bin/flume-ng agent -f ../conf/exec.conf -n a1 -Dflume.root.logger=INFO,console

    3.批量收集
        监控一个文件夹。必须是静态文件
        收集文件之后,会重命名文件成新文件 .complete
        a)配置文件
            [/soft/flume/conf/spooldir.conf]
            a1.sources = r1
            a1.channels = c1
            a1.sinks = k1

            a1.sources.r1.type=spooldir
            a1.sources.r1.spoolDir=/home/ubuntu/spool
            a1.sources.r1.fileHeader=true

            a1.sinks.k1.type=logger

            a1.channels.c1.type=memory

            a1.sources.r1.channels=c1
            a1.sinks.k1.channel=c1

        b)创建目录
            $>mkdir ~/spool

        c)启动flume
            $>bin/flume-ng agent -f ../conf/spooldir.conf -n a1 -Dflume.root.logger=INFO,console



    4.seq文件 source

        a.创建配置文件[/soft/flume/conf/seq.conf]
            a1.sources = r1
            a1.channels = c1
            a1.sinks = k1

            #数据输入类型是seq,最大事件数是1000,步长是1
            a1.sources.r1.type=seq
            a1.sources.r1.totalEvents=1000

            a1.sinks.k1.type=logger

            a1.channels.c1.type=memory

            a1.sources.r1.channels=c1
            a1.sinks.k1.channel=c1

        b.[运行]
            $>bin/flume-ng agent -f ../conf/seq.conf -n a1 -Dflume.root.logger=INFO,console


    5.stress source 压力测试,瞬间产生大量的日志

        a.创建配置文件[/soft/flume/conf/stress.conf]
            a1.sources = r1
            a1.channels = c1
            a1.sinks = k1

            a1.sources.r1.type = org.apache.flume.source.StressSource
            a1.sources.r1.size = 10240
            a1.sources.r1.maxTotalEvents = 1000000
            a1.sources.r1.channels = c1

            a1.channels.c1.type=memory

            a1.sinks.k1.channel=c1
            a1.sinks.k1.type=logger

        b.[运行]
            $>bin/flume-ng agent -f ../conf/stress.conf -n a1 -Dflume.root.logger=INFO,console


六、sinks 沉槽
----------------------------------------------------------------
    1.HDFS -- 从tomcat中收集日志,并向hdfs中写日志

        a.创建配置文件[/soft/flume/conf/hdfs.conf]
            a1.sources = r1
            a1.channels = c1
            a1.sinks = k1

            a1.sources.r1.type = netcat
            a1.sources.r1.bind = localhost
            a1.sources.r1.port = 8888

            a1.sinks.k1.type = hdfs
            a1.sinks.k1.hdfs.path = /user/ubuntu/flume/events/%y-%m-%d/%H/%M/%S
            a1.sinks.k1.hdfs.filePrefix = events-

            #是否是产生新目录,每十分钟产生一个新目录,一般控制的目录方面。
            #2017-12-12 -->
            #2017-12-12 -->%H%M%S

            a1.sinks.k1.hdfs.round = true
            a1.sinks.k1.hdfs.roundValue = 10
            a1.sinks.k1.hdfs.roundUnit = second

            a1.sinks.k1.hdfs.useLocalTimeStamp=true

            #是否产生新文件。
            a1.sinks.k1.hdfs.rollInterval=10
            a1.sinks.k1.hdfs.rollSize=10
            a1.sinks.k1.hdfs.rollCount=3

            a1.channels.c1.type=memory

            a1.sources.r1.channels = c1
            a1.sinks.k1.channel = c1

        b.[运行]
            $>bin/flume-ng agent -f ../conf/hdfs.conf -n a1


    2.hive
        略

    3.hbase
        a.创建配置文件[/soft/flume/conf/hbase.conf]

            #创建3大组件
            a1.sources = r1
            a1.channels = c1
            a1.sinks = k1

            #配置源信息
            a1.sources.r1.type = netcat
            a1.sources.r1.bind = localhost
            a1.sources.r1.port = 8888

            #配置通道信息
            a1.channels.c1.type=memory

            #配置沉槽信息
            a1.sinks.k1.type = hbase
            a1.sinks.k1.table = ns1:t12
            a1.sinks.k1.columnFamily = f1
            a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer

            #用通道绑定源和沉槽
            a1.sources.r1.channels = c1
            a1.sinks.k1.channel = c1

        b.[运行]
            $>bin/flume-ng agent -f ../conf/hbase.conf -n a1

    4.kafka
        略


七、使用avroSource和AvroSink实现跃点agent处理
-------------------------------------------------------------
    1.创建配置文件
    [avro_hop.conf]
        #a1 nc输入/avro输出
        a1.sources = r1
        a1.sinks= k1
        a1.channels = c1

        a1.sources.r1.type=netcat
        a1.sources.r1.bind=localhost
        a1.sources.r1.port=8888

        a1.sinks.k1.type = avro
        a1.sinks.k1.hostname=localhost
        a1.sinks.k1.port=9999

        a1.channels.c1.type=memory

        a1.sources.r1.channels = c1
        a1.sinks.k1.channel = c1

        #a2 avro输入/logger输出
        a2.sources = r2
        a2.sinks= k2
        a2.channels = c2

        a2.sources.r2.type=avro
        a2.sources.r2.bind=localhost
        a2.sources.r2.port=9999

        a2.sinks.k2.type = logger

        a2.channels.c2.type=memory

        a2.sources.r2.channels = c2
        a2.sinks.k2.channel = c2

    2.启动a2
        $>flume-ng agent -f /soft/flume/conf/avro_hop.conf -n a2 -Dflume.root.logger=INFO,console

    3.验证a2
        $>netstat -anop | grep 9999

    4.启动a1
        $>flume-ng agent -f /soft/flume/conf/avro_hop.conf -n a1

    5.验证a1
        $>netstat -anop | grep 8888


八、channel 通道
----------------------------------------------------------------
    1.MemoryChannel

    2.FileChannel
        a.创建配置文件[file.conf]
            a1.sources = r1
            a1.sinks= k1
            a1.channels = c1

            a1.sources.r1.type=netcat
            a1.sources.r1.bind=localhost
            a1.sources.r1.port=8888

            a1.sinks.k1.type=logger

            a1.channels.c1.type = file
            a1.channels.c1.checkpointDir = /home/ubuntu/flume/fc_check
            a1.channels.c1.dataDirs = /home/ubuntu/flume/fc_data

            a1.sources.r1.channels=c1
            a1.sinks.k1.channel=c1

         b.[运行]
            $>flume-ng agent -f /soft/flume/conf/file.conf -n a1 -Dflume.root.logger=INFO,console

    3.可溢出文件通道
        a.创建配置文件[spilt.conf]
            a1.sources = r1
            a1.sinks= k1
            a1.channels = c1

            a1.sources.r1.type=netcat
            a1.sources.r1.bind=localhost
            a1.sources.r1.port=8888

            a1.sinks.k1.type=logger

            a1.channels.c1.type = SPILLABLEMEMORY
            #0表示禁用内存通道,等价于文件通道
            a1.channels.c1.memoryCapacity = 0
            #0,禁用文件通道,等价内存通道。
            a1.channels.c1.overflowCapacity = 2000

            a1.channels.c1.byteCapacity = 800000
            a1.channels.c1.checkpointDir = /home/ubuntu/flume/fc_check
            a1.channels.c1.dataDirs = /home/ubuntu/flume/fc_data

            a1.sources.r1.channels=c1
            a1.sinks.k1.channel=c1

         b.[运行]
            $>flume-ng agent -f /soft/flume/conf/spilt.conf -n a1 -Dflume.root.logger=INFO,console

猜你喜欢

转载自blog.csdn.net/xcvbxv01/article/details/82773863