Flume log collection frame (1)

Flume Introduction

1 Overview

Flume Cloudera is to provide a distributed, highly reliable, and highly available massive log collection, aggregation and transmission systems.

Flume output file can be collected, socket packet, folders and other forms of data sources, and data can be collected to the HDFS, hbase, hive, kafka and other external storage system.

General requirements gathering, through simple configuration of the flume can be realized.

Flume for specific scenes also have a good ability to custom extension, therefore, flume can be used for most routine data collection scenarios.

 

2: operating mechanism

Flume distributed system is the most central role agent, flume acquisition system is the one formed by connecting the agent.

Each agent acts as a data transfer member, the interior has three components:

           a) Source: acquisition source for docking with a data source to obtain data.

           b) Sink: countersunk transmission destination, data collection, agent data transfer for the next stage or to the final delivery data storage system

           c) Channel: inside angent data transmission channel for transmitting data from the source to the sink.

 

3: complex structures

Series between the multi-level agent

(1) The first: 2 serial agent

(2) The second: a plurality of data acquisition agent be aggregated

(3) Third: The collected data to a different system in the lower layer


Flume real case

1: Flume installation deployment

a) Flume installation is very simple, just unzip can, of course, that there hadoop environment installation package to upload all the data elements on the node

b) then extract tar -zxvf apache-flume-1.6.0-bin.tar.gz

c) the flume into the directory, flume-enc.sh under modified conf, arranged inside JAVA_HOME

Configuration acquisition scheme according to the needs of the data acquisition, is described in the configuration file (the file name can be arbitrarily defined)

Acquisition scheme specified profile, starting flume agent in the corresponding node

 

2: Simple Case

a) Now create a new file in the conf directory flume

we netcat-logger.conf

# 定义这个agent中各组件的名字
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 描述和配置source组件:r1
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop01
a1.sources.r1.port = 44444

# 描述和配置sink组件:k1
a1.sinks.k1.type = logger

# 描述和配置channel组件,此处使用是内存缓存的方式
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 描述和配置source  channel   sink之间的连接关系
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2、启动agent去采集数据
bin/flume-ng agent -c conf -f conf/netcat-logger.conf -n a1  -Dflume.root.logger=INFO,console

b) activating agent to collect data

bin/flume-ng agent -c conf -f conf/netcat-logger.conf -n a1  -Dflume.root.logger=INFO,console

-c conf designated flume own configuration file directory

-f conf / netcat-logger.con designated collection scheme we have described

-n a1 specify the name of our agent

3: Test

First to send data to the monitor port agent acquisition, so that there is data recoverable agent, the agent casually talk to a networked node machine

telnet anget-hostname  port   (telnet itcast01 44444) 

 

Source Components

Source Type Explanation
Avro Source Avro support agreement (actually Avro RPC), built-in support.
Exec Source Producing output data based on the command of the standard Unix
Spooling Directory Source Monitoring data changes within the specified directory.
Netcat Source A monitor port, each line of text data stream input port as the Event.
Thrift Source Thrift protocol support, built-in support.
JMS Source Reading data from the JMS system (message topic) in, the ActiveMQ been tested.
Sequence Generator Source  A data sequence generator source, the manufacturer's serial data.
Syslog Source Read syslog data, generating Event, TPC and UDP support both protocols.
HTTP Source Based on the data source HTTP POST or GET mode, support for JSON, BLOB forms.
Legacy Source Compatible Source (0.9.x version) in the old Flume OG.

Channel Components

Channel type Explanation
Memory Channel Event data is stored in memory.
File Channel Event data is stored in a disk file.
JDBC Channel Event data is stored in persistent storage, the current Flume Channel built-in support Derby database.
Spillable Memory Channel Event data is stored in memory and on disk when memory queue is full, it will be persisted to disk file.
Pseudo Transaction Channel Testing purposes
Custom Channel Custom Channel implementation.

Sink Components

Sink type Explanation
HDFS Sink Data is written to HDFS.
Avro Sink  Data is converted into Avro Event, and then sent to the RPC ports configured.
Thrift Sink Data is converted into Thrift Event, and then sent to the RPC ports configured.
IRC Sink Data playback on IRC.
File Roll Sink Stored data to a local file system.
Null Sink Discarding all the data.
HBase Sink HBase data is written to the database.
Morphline Solr Sink Send data to Solr search server.
ElasticSearch Sink Elastic Search Search sends data to the server (cluster)
Custom Sink Custom Sink achieve

Flume support a large number of source, channel, sink type, detailed manuals refer to the official documentation  http://flume.apache.org/FlumeUserGuide.html

Published 33 original articles · won praise 3 · Views 5860

Guess you like

Origin blog.csdn.net/WandaZw/article/details/83687548