Deployment and testing of Flume (log collection system)

Deployment and testing of Flume

Flume is a distributed, highly reliable, and highly available log collection system. It can effectively collect, aggregate, and move a large number of logs. Aggregating various types of data sources to various types of destinations, Flume has a slogan: "We do not produce data, we are the porters of data.

Official website documentation:  http://flume.apache.org/FlumeUserGuide.html

Related Projects:
Related Sample Projects: Leek - Simple Real-Time Smart Stock Picking Platform

Flume components:

The core of Flume (agent) is to collect data from the data source and send it to the destination. In order to ensure the success of high-reliability transmission, the data will be cached before sending to the destination, and the cached data will be deleted after the data actually reaches the destination. Flume: The basic unit of transmitted data is Event. If it is a text file, it is usually a line of records, which is also the basic unit of transaction.

  • Event: (including: headers:{}, body) From Source, to Channel, to Sink, itself is a byte array.
  • Source : Collect data, divide it into transition and event and enter it into the channel.
  • Channel : Like a pipeline (queue), it receives the output of the Source and pushes it to the Sink for consumption. Data will not be deleted until it enters the next Channel or enters the terminal. Namely: Transit Event is temporarily stored, which acts as a connection between sources and sinks.
  • Sink : Take out the data in the Channel and send it to an external source (HDFS, HBase) or other Source.

The role of flume:

Data source acquisition tools for real-time and offline computing 

1. Flume installation (JDK environment needs to be installed before installing flume)

http://mirrors.cnnic.cn/apache/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz  
tar -xvzf apache-flume-1.6.0-bin.tar.gz  
mv apache-flume-1.6.0-bin apache-flume-1.6.0

2. Environment variable settings

vim /etc/profile
export FLUME_HOME=/usr/local/apache-flume-1.6.0
export FLUME_CONF_DIR=$FLUME_HOME/conf
export PATH=.:$PATH::$FLUME_HOME/bin
source /etc/profile
echo $FLUME_HOME

3. Flume configuration

cd /usr/local/apache-flume-1.6.0/conf/
cp flume-conf.properties.template flume-conf.properties
vim flume-conf.properties
Next we want to achieve the following goals:

The nginx access log of server B is transmitted to server A, and server A outputs the nginx access log on this server and the log transmitted from server B to the /tmp/flume folder.

#服务器A的配置:
#下面的agent1是代理名称
agent1.sources = s1 s2
agent1.channels = c1
agent1.sinks = k1

#配置数据源source
agent1.sources.s1.type = avro
agent1.sources.s1.bind= 0.0.0.0    
agent1.sources.s1.port= 44444    
agent1.sources.s1.channels= c1
agent1.sources.s2.batchSize=1000
agent1.sources.s2.batchTimeout=1000

agent1.sources.s2.type = exec
agent1.sources.s2.channels = c1
agent1.sources.s2.command = tail -f /usr/local/nginx/logs/access.log
agent1.sources.s2.batchSize=1000
agent1.sources.s2.batchTimeout=1000


# 配置内存 channel
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 50000
agent1.channels.c1.transactionCapacity = 10000

# 配置 sinks
agent1.sinks.k1.channel = c1
agent1.sinks.k1.type = file_roll
agent1.sinks.k1.sink.directory = /tmp/flume
#默认值为30,即每30秒生成一个文件  
#为0时表示只有一个文件存放数据
agent1.sinks.k1.sink.rollInterval  = 0 
#设置的越大吞吐能力越好,延时也越大
agent1.sinks.k1.sink.batchSize=100

 

#服务器B的配置:
#下面的agent1是代理名称
agent1.sources = s1
agent1.channels = c1
agent1.sinks = k1

#配置数据源source
agent1.sources.s1.type = exec
agent1.sources.s1.channels = c1
agent1.sources.s1.command = tail -f /usr/local/nginx/logs/access.log

# 配置内存 channel
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 10000
agent1.channels.c1.transactionCapacity = 10000

# 配置 sinks
# 数据被转换成 Avro Event ,然后发送到指定的服务端口上。
agent1.sinks.k1.channel = c1
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = 192.168.xx.xxx
agent1.sinks.k1.port = 44444
agent1.sinks.k1.batch-size=50

4. Start the service in turn

cd /usr/local/apache-flume-1.6.0/
nohup flume-ng agent -n agent1 -c conf -f conf/flume-conf.properties &

5. View the integrated log output on the A server

tail -f  /tmp/flume/xxxxxx

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326688045&siteId=291194637