Several common Flume log collection scenarios

  Here we mainly introduce the source sources of several common logs, including monitoring file type, monitoring file content increment, TCP and HTTP.

Spool type

  Used to monitor the data changes in the specified directory, if there is a new file, read and upload the data in the new file

  In teach you step to build Flume distributed logging systems finally have introduced this Case

Exec

  EXEC executes a given command to get the source of the output. If you want to use the tail command, you must choose to make the file large enough to see the output

Create agent configuration file   

# vi /usr/local/flume170/conf/exec_tail.conf

Copy code
a1.sources = r1
a1.channels = c1 c2
a1.sinks = k1 k2

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.channels = c1 c2
a1.sources.r1.command = tail -F /var/log/haproxy.log

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = file
a1.channels.c2.checkpointDir = /usr/local/flume170/checkpoint
a1.channels.c2.dataDirs = /usr/local/flume170/data

# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.channel =c1

a1.sinks.k2.type = FILE_ROLL
a1.sinks.k2.channel = c2
a1.sinks.k2.sink.directory = /usr/local/flume170/files
a1.sinks.k2.sink.rollInterval = 0
Copy code

 Start flume agent a1

  # /usr/local/flume170/bin/flume-ng agent -c. -f /usr/local/flume170/conf/exec_tail.conf -n a1 -Dflume.root.logger=INFO, the console
  generates enough content in In the file
  # for i in {1..100};do echo "exec tail$i" >> /usr/local/flume170/log_exec_tail;echo $i;sleep 0.1;done
  in the H32 console, you can see the following information:

Http

JSONHandler type

Data source based on HTTP POST or GET, supports JSON, BLOB representation

Create agent configuration file

# vi /usr/local/flume170/conf/post_json.conf

Copy code
a1.sources = r1
a1.channels = c1
a1.sinks = k1

# Describe/configure the source
a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
a1.sources.r1.port = 5142
a1.sources.r1.channels = c1

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1
Copy code

 Start flume agent a1

# /usr/local/flume170/bin/flume-ng agent -c. -f /usr/local/flume170/conf/post_json.conf -n a1 -Dflume.root.logger=INFO, the console
generates a POST request in JSON format
# curl -X POST -d'[{ "headers" :{"a": "a1","b": "b1"},"body": "idoall.org_body"}]' http://localhost: 8888
in the H32 console, you can see the following information:

 

 

Tcp

Syslogtcp monitors the TCP port as the data source

Create agent configuration file

# vi /usr/local/flume170/conf/syslog_tcp.conf

Copy code
a1.sources = r1
a1.channels = c1
a1.sinks = k1

# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = H32
a1.sources.r1.channels = c1

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1
Copy code

Start flume agent a1

# /usr/local/flume170/bin/flume-ng agent -c. -f /usr/local/flume170/conf/syslog_tcp.conf -n a1 -Dflume.root.logger=INFO, console
test generates syslog
# echo " hello idoall.org syslog" | nc localhost 5140
In the console of H32, you can see the following information:

 

Flume Sink Processors and Avro types

  Avro can send a given file to Flume, and the Avro source uses the AVRO RPC mechanism. 

  The failover machine is always sent to one of the sinks, and when the sink is unavailable, it is automatically sent to the next sink. The transactionCapacity parameter of the channel cannot be less than the batchsiz of the sink.
  Create the Flume_Sink_Processors configuration file in H32
  # vi /usr/local/flume170/conf/Flume_Sink_Processors.conf

Copy code
a1.sources = r1
a1.channels = c1 c2
a1.sinks = k1 k2

# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1 c2
a1.sources.r1.selector.type = replicating

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100 # the Describe The sink 
a1.sinks.k1.type = Avro 
a1.sinks.k1.channel = C1 
a1.sinks.k1.hostname = the H32 
a1.sinks.k1.port = 5141 
A1 .sinks.k2.type = Avro 
a1.sinks.k2.channel = C2 
a1.sinks.k2.hostname = H33 
a1.sinks.k2.port = 5141 # key to this is to configure failover, the need for a Group sink 
A1. = sinkgroups G1 
a1.sinkgroups.g1.sinks = K1 K2
 # type of treatment is the failover 
a1.sinkgroups.g1.processor.type =




failover
 # priority, the higher the number the greater the priority, the priority of each sink must not be the same 
a1.sinkgroups.g1.processor.priority.k1 =. 5 
a1.sinkgroups.g1.processor.priority.k2 = 10
 # Set 10 seconds, of course you can change it to faster or slower according to your actual situation 
a1.sinkgroups.g1.processor.maxpenalty = 10000
  
Copy code

   Create Flume_Sink_Processors_avro configuration file in H32

  # vi /usr/local/flume170/conf/Flume_Sink_Processors_avro.conf

Copy code
a1.sources = r1
a1.channels = c1
a1.sinks = k1

# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 5141

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1
Copy code

  Copy the 2 configuration files to the previous H33

  /usr/local/flume170# scp -r /usr/local/flume170/conf/Flume_Sink_Processors.conf H33:/usr/local/flume170/conf/Flume_Sink_Processors.conf
  /usr/local/flume170# scp -r /usr/local /flume170/conf/Flume_Sink_Processors_avro.conf H33:/usr/local/flume170/conf/Flume_Sink_Processors_avro.conf
  Open 4 windows and start two flume agents on H32 and H33 at the same time
  # /usr/local/flume170/bin/flume- ng agent -c. -f /usr/local/flume170/conf/Flume_Sink_Processors_avro.conf -n a1 -Dflume.root.logger=INFO,console
  # /usr/local/flume170/bin/flume-ng agent -c.- f /usr/local/flume170/conf/Flume_Sink_Processors.conf -n a1 -Dflume.root.logger=INFO,console
  Then on any machine of H32 or H33, the test generates log
  # echo "idoall.org test1 failover" | nc H32 5140


  Because H33 has a high priority, you can see the following information in the sink window of H33, but H32 does not:

  At this time, we stop the sink (ctrl+c) on the H33 machine and output the test data again
  # echo "idoall.org test2 failover" | nc localhost 5140
  You can see the two sent just now in the sink window of H32 Test Data:

  Let's start the sink in the sink window of H33:
  # /usr/local/flume170/bin/flume-ng agent -c. -F /usr/local/flume170/conf/Flume_Sink_Processors_avro.conf -n a1 -Dflume.root .logger=INFO, console
  input two batches of test data:
  # echo "idoall.org test3 failover" | nc localhost 5140 && echo "idoall.org test4 failover" | nc localhost 5140
  In the sink window of H33, we can see the following information , Because of the priority, the log message will fall on H33 again:

Load balancing Sink Processor

  The difference between load balance type and failover is that load balance has two configurations, one is polling and the other is random. In both cases, if the selected sink is not available, it will automatically try to send to the next available sink.
  Create Load_balancing_Sink_Processors configuration file in H32
  # vi /usr/local/flume170/conf/Load_balancing_Sink_Processors.conf

Copy code
a1.sources = r1
a1.channels = c1
a1.sinks = k1 k2

# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname =H32 
a1.sinks.k1.port = 5141 

a1.sinks.k2.type = Avro 
a1.sinks.k2.channel = c1 
a1.sinks.k2.hostname = H33 
a1.sinks.k2.port = 5141 # This is the configuration key failover, the need for a Group sink 
a1.sinkgroups = G1 
a1.sinkgroups.g1.sinks = K1 K2
 # type of treatment is load_balance 
a1.sinkgroups.g1.processor.type = load_balance 
a1.sinkgroups.g1.processor.backoff = true 
a1.sinkgroups.g1.processor.selector = round_robin
  

Copy code

  Create Load_balancing_Sink_Processors_avro configuration file in H32

  # vi /usr/local/flume170/conf/Load_balancing_Sink_Processors_avro.conf

Copy code
a1.sources = r1
a1.channels = c1
a1.sinks = k1

# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 5141

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1
Copy code

  Copy the 2 configuration files to the previous H33

/usr/local/flume170# scp -r /usr/local/flume170/conf/Load_balancing_Sink_Processors.conf H33:/usr/local/flume170/conf/Load_balancing_Sink_Processors.conf
/usr/local/flume170# scp -r /usr/local/flume170/conf/Load_balancing_Sink_Processors_avro.conf H33:/usr/local/flume170/conf/Load_balancing_Sink_Processors_avro.conf


Open 4 windows and start two flume agents on H32 and H33 at the same time
# /usr/local/flume170/bin/flume-ng agent -c. -F /usr/local/flume170/conf/Load_balancing_Sink_Processors_avro.conf -n a1 -Dflume.root.logger=INFO,console
# /usr/local/flume170/bin/flume-ng agent -c. -F /usr/local/flume170/conf/Load_balancing_Sink_Processors.conf -n a1 -Dflume.root.logger =INFO,console


Then on any machine of H32 or H33, the test generates log, input line by line, input is too fast, easy to fall on one machine
# echo "idoall.org test1" | nc H32 5140
# echo "idoall.org test2 "| nc H32 5140
# echo "idoall.org test3" | nc H32 5140
# echo "idoall.org test4" | nc H32 5140


In the sink window of H32, you can see the following information:
1. 14/08/10 15:35:29 INFO sink.LoggerSink: Event: {headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 32 idoall.org test2}
2. 14/08/10 15:35:33 INFO sink.LoggerSink: Event: {headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 34 idoall.org test4}


In the sink window of H33, you can see the following information:
1. 14/08/10 15:35:27 INFO sink.LoggerSink: Event: {headers:{Severity=0, flume.syslog.status=Invalid, Facility=0 } body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1}
2. 14/08/10 15:35:29 INFO sink.LoggerSink: Event: {headers:{Severity=0 , flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 33 idoall.org test3}
shows that the polling mode has played a role.

   The above are all based on the fact that H32 and H33 can interoperate, and Flume is configured correctly, and they are all very simple scenarios. It is worth noting that Flume says it is log collection, but in fact, it can be widely regarded as "log". It can be regarded as an information flow, not limited to a cognitive log.


Reprinted: http://www.cnblogs.com/zhangs1986/p/6897388.html

Guess you like

Origin blog.csdn.net/qq_41587243/article/details/80454628