Flume's high-availability distributed serial collection of data to HDFS example

Flume's high-availability distributed serial collection of data to HDFS example

1. Case introduction

It is necessary to sink the log logs in the three log server (ip: 192.168.100.9, 192.168.100.13, 192.168.100.100) /home/hadoop/access, /home/hadoop/orderand folders to another agent cluster./home/hadoop/login

The agent cluster uses 2 machines (ip are: 192.168.10.11, 192.168.10.12). Among them 192.169.100.11as master (priority 10), 192.169.100.12as slave (priority 5).

The data collected by the agent cluster sinks into the HDFS system.

The collected data needs to be classified according to host ip, access, order, and login, and written to the hdfs file system.

The distributed framework is shown in the figure:

image

See the configuration file below

Two, placement

  1. Agent configuration information of the log client. The configuration file name is flume-collect-local-log.conf:
a1.sources=r1 r2 r3
a1.sinks=k1 k2
a1.channels=c1

a1.sinkgroups = g1

# r1
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/hadoop/access
a1.sources.r1.fileHeader = false

# r2
a1.sources.r2.type = spooldir
a1.sources.r2.spoolDir = /home/hadoop/order
a1.sources.r3.fileHeader = false

# r3
a1.sources.r3.type = spooldir
a1.sources.r3.spoolDir = /home/hadoop/login
a1.sources.r3.fileHeader = false

# r1拦截器
a1.sources.r1.interceptors = i1 i2
a1.sources.r1.interceptors.i1.type=static
a1.sources.r1.interceptors.i1.preserveExisting = true
a1.sources.r1.interceptors.i1.key = source
a1.sources.r1.interceptors.i1.value = access
a1.sources.r1.interceptors.i2.type=host
a1.sources.r1.interceptors.i2.hostHeader = hostname

# r2拦截器
a1.sources.r2.interceptors = i1 i2
a1.sources.r2.interceptors.i1.type=static
a1.sources.r2.interceptors.i1.preserveExisting = true
a1.sources.r2.interceptors.i1.key = source
a1.sources.r2.interceptors.i1.value = order
a1.sources.r2.interceptors.i2.type=host
a1.sources.r2.interceptors.i2.hostHeader = hostname

# r3拦截器
a1.sources.r3.interceptors = i1 i2
a1.sources.r3.interceptors.i1.type=static
a1.sources.r3.interceptors.i1.preserveExisting = true
a1.sources.r3.interceptors.i1.key = source
a1.sources.r3.interceptors.i1.value = login
a1.sources.r3.interceptors.i2.type=host
a1.sources.r3.interceptors.i2.hostHeader = hostname

# k1
a1.sinks.k1.type=avro
a1.sinks.k1.hostname = 192.168.100.11  
a1.sinks.k1.port = 11111

# k2
a1.sinks.k2.type = avro  
a1.sinks.k2.hostname = 192.168.100.12  
a1.sinks.k2.port = 11111 

# c1
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 设置sink group优先级
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = failover  
a1.sinkgroups.g1.processor.priority.k1 = 10  
a1.sinkgroups.g1.processor.priority.k2 = 5  
a1.sinkgroups.g1.processor.maxpenalty = 10000  

# r1 r2 r3 c1 s1关系配置8
a1.sources.r1.channels = c1
a1.sources.r2.channels = c1
a1.sources.r3.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

A static interceptor is configured here to pass the identifier and partition HDFS

  1. The configuration information of the agent cluster, the configuration file name is flume-collect-hdfs.conf:
a1.sources = r1  
a1.channels = c1  
a1.sinks = k1  

# r1
a1.sources.r1.type = avro  
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 11111

# 拦截器
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp  

# s1
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=/flume-log/%{source}/%{hostname}/%y%m%d
a1.sinks.k1.hdfs.fileType=DataStream  
a1.sinks.k1.hdfs.writeFormat=TEXT  
a1.sinks.k1.hdfs.rollInterval=1  
a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d  
a1.sinks.k1.hdfs.fileSuffix=.txt  
a1.sinks.k1.hdfs.rollSize = 1024
a1.sinks.k1.hdfs.rollCount = 10
a1.sinks.k1.hdfs.rollInterval = 60

# c1  
a1.channels.c1.type = memory  
a1.channels.c1.capacity = 1000  
a1.channels.c1.transactionCapacity = 100  


# 配置r1 s1 c1的关系
a1.sources.r1.channels = c1  
a1.sinks.k1.channel = c1

3. Execution

  1. Write a test file to the log server's access,order,logindirectory. Such as:

access.log

access 192.168.100.9

order.log

order 192.168.100.9

login.log

login 192.168.100.9

Write different content to the corresponding directory settings of each machine, and test the hdfs partition situation

  1. 192.168.100.11Start the ( and 192.168.100.12) flume services in the agent cluster respectively
bin/flume-ng agent -c conf -f conf/flume-collect-hdfs.conf -name a1 -Dflume.root.logger=INFO,console
  1. Start the flume service in the log log separately
bin/flume-ng agent -c conf -f conf/flume-collect-local-log.conf -name a1 -Dflume.root.logger=INFO,console

4. View the results

image

The result is partitioned.

Added: Flume's Load Balancing Example

Flume client collects the log conf file. By a1.sinkgroups.g1.processor.type = load_balanceimplementing load balancing.

#a1 name
a1.channels = c1
a1.sources = r1
a1.sinks = k1 k2

#set gruop
a1.sinkgroups = g1

#set channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/logs/test.log

# set sink1
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.100.11
a1.sinks.k1.port = 11111

# set sink2
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = 192.168.100.12
a1.sinks.k2.port = 11111

#set sink group
a1.sinkgroups.g1.sinks = k1 k2

#set load-balance
a1.sinkgroups.g1.processor.type = load_balance
# 默认是round_robin,还可以选择random
a1.sinkgroups.g1.processor.selector = round_robin
#如果backoff被开启,则 sink processor会屏蔽故障的sink
a1.sinkgroups.g1.processor.backoff = true


a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

Flume's server-side code is the same as above.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325676674&siteId=291194637