Flume Agent internal principle - Yu-fei

Why not write directly from the source system to the Hadoop cluster it? This is because the original system there are tens of thousands of machines, real-time written to HDFS, namenode will have a lot of small files, for hadoop pressure will be very large. Therefore, the introduction of an intermediate system --Flume. Flume really doing is pushing real-time events, data flow is ongoing and the magnitude of a great situation.

Flume data interpreted as a section of the event.

Each Flume Agent contains three main components: Source, Channel, Sink, The figure is the Flume Agent Chart

 

 

A, Source Components

Receiving the data generated from other data applications. It has its own produce source data, but these are usually used for testing purposes Source. Source can monitor one or more network ports for receiving data or may read data from the local file system. Each source must be connected at least one Channel.

The figure is the Source, selector, interactive process interceptor

 

 

 

Two, Channel

In general, channcel passive components. reading data from the sink channel.

Three, Sink components

Fourth, the configuration Flume Agent

Using properties file format

k1 = v1

k2 = v2

There Flume Agent In some instances there may be several components, such as Source, Sink, Channel, etc., these components need to be named. Configuration files must use the following format lists the name of Source, Sink, Channel group, the list is called active list:

agent1.sources = source1 source2

agent1.sinks = sink1 sink2 sink3 sink4

agent1.sinkgroups = SG1 SG2

agent1.channels = channel1 channel2

The above configuration of the segment represented by named agent1 Flume Agent. With two Source, Sink two groups, two channel, four sink.

Guess you like

Origin www.cnblogs.com/zhouyuekji/p/12396486.html