Flume中Source、Channel、Sink

Flume Profile

    Flume is Cloudera (CDH is that company) real-time log collection system development, it is a distributed, reliable, and high availability of massive log collection, aggregation and transmission systems.
    Sending log data Flume may be received according to the needs of different data types, while transmission and reception of data to other places having a receiving capability or processing capability, such as Kafka, HDFS, local files, such as punching one app Buried collected over to flume, the log can be stored in the file and then parsing hdfs or offline analysis, but also (and can) send logs to Kafka, and then analyzed in real time; Flume here acts as a conduit similar effect, the received data drainage to the desired place, the data itself is not stored but there is a certain buffer, kafka distinguished from the message queue to be persistent data;
   Flume three important Source, Channel, Sink

Source

   That flume source, is designed to collect data, you can handle various types of log data in various formats (including avro, thrift, exec, jms, spooling directory, netcat, generator, syslog, http, legacy, custom) , and some other data source, such as Http, a class may be rewritten themselves, such as automatically reads data from the database (flume-ng-sql-source), and the received data is transmitted to the event to Flume or a multi-format channel channel ,.

Channel

   That pipeline, source data is received, the sink in front of the middle of a connecting channel, as will be appreciated, the data itself is not stored into the tower from the tap water pipe between just a short temporary receiving container, Channel from the source thereof to the format of the event data cached until they are consumed sinks, it plays a role of a bridge between the total source and sink

Sink

   I.e., where the flow of the final data, it is transmitted over the channel to the specified data storage component, including text, HDFS, database, Kafka, HBase, network flow, etc.

Configuration Example
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source:配置接收的源为avro格式,并且定义了发送的ip和端口,一般log4j,logback在打日志就能发送,已经有集成好的logback的
a1.sources.r1.channels =  c1
a1.sources.r1.type = avro
a1.sources.r1.bind = 192.168.1.100
a1.sources.r1.port = 44444

# Describe the sink1  to local file:定义sink的形式和其他配置,file_roll表示写入本地文件中
a1.sinks.k1.channel = c1
a1.sinks.k1.type = file_roll  
a1.sinks.k1.sink.directory = /data/log/test
a1.sinks.k1.sink.file.prefix=flume-
a1.sinks.k1.sink.rollInterval=3600
a1.sinks.k1.sink.batchSize=10
a1.sinks.k1.sink.serializer=text
a1.sinks.k1.sink.serializer.appendNewline = true

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10240 #内存中存储 Event 的最大数
a1.channels.c1.transactionCapacity = 10240 #source 或者 sink 每个事务中存取 Event 的操作数量(不能比 capacity 大)
Event是Flume流中的最小单位,比如一行就是一个Event,在代码中可以根据不同的event的进行不同数据处理
Published 30 original articles · won praise 33 · views 180 000 +

Guess you like

Origin blog.csdn.net/huxin008/article/details/80961171