Flume data acquisition preparation

 


  Flume is a highly available, highly reliable, and distributed massive log collection, aggregation, and transmission system provided by Cloudera. Flume supports customizing various data senders in the log system to collect data; at the same time, Flume provides The ability to simply process data and write to various data recipients (customizable).

 

1. Flume node service design

 

 

 

2. Download Flume and install it

  1) Download the Apache version of Flume.

  2) Download the Cloudera version of Flume.

  3) Here, choose to download the Apache version of apache-flume-1.7.0-bin.tar.gz, and then upload it to the /opt/softwares/ directory of the bigdata-pro01.kfk.com node

  4) Unzip Flume

[kfk@bigdata-pro01 softwares]$ tar -zxf apache-flume-1.7.0-bin.tar.gz -C ../modules/

[kfk@bigdata-pro01 softwares]$ cd ../modules/

[kfk@bigdata-pro01 modules]$ ls

apache-flume-1.7.0-bin  hadoop-2.6.0  hbase-0.98.6-cdh5.3.0  jdk1.8.0_60  kafka_2.11-0.8.2.1  zookeeper-3.4.5-cdh5.10.0

[kfk@bigdata-pro01 modules]$ mv apache-flume-1.7.0-bin/ flume-1.7.0-bin/

  5) Distribute flume to the other two nodes

scp -r flume-1.7.0-bin bigdata-pro02.kfk.com:/opt/modules/

scp -r flume-1.7.0-bin bigdata-pro03.kfk.com:/opt/modules/

 

 

3. Flume agent-1 collection node service configuration

 

1) The bigdata-pro02.kfk.com node configures flume to collect data to the bigdata-pro01.kfk.com node

  Create a new connection from notepad++ to the second node, then rename all files under conf, and remove the .template suffix.

 

 

  First configure the Java environment variable:

 

 

  Then configure the flume-conf.properties file, focusing on configuring the three threads of source, channel and sink in the flow chart above.

  Since the given template configuration is incomplete and the format is a bit messy, we will kill all of them and fill in the following content.

 

agent2.sources = r1
agent2.channels = c1
agent2.sinks = k1

agent2.sources.r1.type = exec
agent2.sources.r1.command = tail -F /opt/datas/weblogs.log
agent2.sources.r1.channels = c1

agent2.channels.c1.type = memory
agent2.channels.c1.capacity = 10000
agent2.channels.c1.transactionCapacity = 10000
agent2.channels.c1.keep-alive = 5

agent2.sinks.k1.type = avro
agent2.sinks.k1.channel = c1
agent2.sinks.k1.hostname = bigdata-pro01.kfk.com
agent2.sinks.k1.port = 5555

  Nodes No. 2 and No. 3 are responsible for collecting the logs of the application server. The source used for them is exec (standard output of the command line), and then pushed to the No. 1 machine through the sink terminal (avro type) for log merge processing. As shown in the red box below:

 

 

The configuration explanation  given by the Flume official website is also very comprehensive. You can read the following and learn to customize the configuration according to the official website guide.

 

2) Send the above configuration to node 3.

scp -r flume-1.7.0-bin/ bigdata-pro03.kfk.com:/opt/modules/

  Then change all agent2 in the configuration file to agent3 to realize the function of collecting data to the bigdata-pro01.kfk.com node.

  Remember to create weblogs files!

[kfk@bigdata-pro03 ~]$ cd /opt/datas/

[kfk@bigdata-pro03 datas]$ touch weblogs.log

[kfk@bigdata-pro03 datas]$ ls

weblogs.log

 


The above is the main content of this section introduced by the blogger. This is the blogger's own learning process. I hope it can give you some guidance. If it is useful, I hope you can support it. If it is not useful to you I also hope to forgive, and please point out any mistakes. If you are looking forward to it, you can follow the blogger to get the update as soon as possible, thank you! At the same time, reprinting is also welcome, but the original address must be marked in the obvious position of the blog post, and the right of interpretation belongs to the blogger!

Guess you like

Origin blog.csdn.net/py_123456/article/details/83376793
Recommended