flume and hdfs

 

Flume definition:

Flume Cloudera is provided to a highly available, highly reliable, distributed massive log collection, aggregation and transmission systems. Flume flow-based architecture, flexible and simple.

 

Why choose Flume

  Main functions: Real-time data server reads the local disk, write data to HDFS

 

 

 Flume organizational structure

  1, the simplest organizational structure

 

  2, Flume process stream

 

 

  Description:

    source: data input

      Common types: spooling directory, exec, syslog, avro, netcat, etc.

    channel: a buffer located between the source and sink

      memory: memory-based caching, allowing data to be lost there

      file: persistent channel, system downtime without losing data

    sink: the data output terminal

      Common destinations are: HDFS, Kafka, logger, avro, File, Custom

    Put the transaction process:

      doPut: the batch data is written to a temporary buffer putList

      doCommit: Check the adequacy of the combined channel memory queue

      doRollback: Insufficient memory queue space, data rollback

    Take the transaction process:

      doTake: batch extract data to a temporary buffer takeList

      doCommit: If all the data sent successfully, then empty the temporary buffer takeList

      doRollback: data takeList if abnormal, will appear a temporary buffer during data transmission back to the channel

 

 

 

  

 

Guess you like

Origin www.cnblogs.com/kongzhagen/p/12623208.html