Table of contents
Apache Hadoop Ecology - Directory Summary - Continuous Update
2: Case requirements - implement failover
2.2: Implement flume2.conf - port 4141
2.3: Implement flume3.conf - port 4142
3: Start the transmission link
Apache Hadoop Ecology - Directory Summary - Continuous Update
System environment: centos7
Java environment: Java8
The case only demonstrates the channel process, where the types of Source, channel, and Sink can be adjusted as needed
Choose one of failover configuration and load balancing, and cannot be configured at the same time
logic:
Flume supports the use of logically dividing multiple sinks into a sink group, and the sink group cooperates with different SinkProcessors to achieve load balancing and error recovery.
Load balancing: Multiple tasks are fetched in turn
Failover: Devices with high priority take task failover
2: Case requirements - implement failover
Use Flume1 to monitor a port, and the sinks in the sink group are respectively connected to Flume2 and Flume3, and FailoverSinkProcessor is used to realize the function of failover
Architecture process: Flume1 sends to Flume2 and Flume3 at the same time
Failover: devices with high priority take tasks
3: Implementation steps:
2.1: Implement flume1.conf
Configure 1 netcat source, 1 channel, and 1 sink group (2 sinks), which are respectively sent to flume2 port 4141 and flume3 port 4142
vim flume1.conf
# 1:定义组件
a1.sources = r1
a1.channels = c1
a1.sinks = k1 k2
# 2:定义source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# 3:定义channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 4:定义sink
a1.sinkgroups = g1 # 定义sink组g1
# 故障转移配置 start
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10
a1.sinkgroups.g1.processor.maxpenalty = 10000
# 故障转移配置 end
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = worker214
a1.sinks.k1.port = 4141
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = worker215
a1.sinks.k2.port = 4142
# 5:定义关联关系
a1.sources.r1.channels = c1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1
2.2: Implement flume2.conf - port 4141
source port 4141
vim flume2.conf
# 1:定义组件
a2.sources = r1
a2.sinks = k1
a2.channels = c1
# 2:定义source
a2.sources.r1.type = avro
a2.sources.r1.bind = worker214
a2.sources.r1.port = 4141
# 3:定义channel
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100
# 4:定义sink
a2.sinks.k1.type = logger
# 5:定义关联关系
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1
2.3: Implement flume3.conf - port 4142
source port 4142
vim flume3.conf
# 1:定义组件
a3.sources = r1
a3.sinks = k1
a3.channels = c2
# 2:定义source
a3.sources.r1.type = avro
a3.sources.r1.bind = worker215
a3.sources.r1.port = 4142
# 3:定义channel
a3.channels.c2.type = memory
a3.channels.c2.capacity = 1000
a3.channels.c2.transactionCapacity = 100
# 4:定义sink
a3.sinks.k1.type = logger
# 5:定义关联关系
a3.sources.r1.channels = c2
a3.sinks.k1.channel = c2
3: Start the transmission link
1)启动flume2,监听端口4141
flume-ng agent --name flume2 --conf-file flume2.conf -Dflume.root.logger=INFO,console
2)启动flume3,监听端口4142
flume-ng agent --name flume3 --conf-file flume3.conf -Dflume.root.logger=INFO,console
3)启动flume1,sink到flume2 端口4141,到flume3 端口4142
flume-ng agent --name flume1 --conf-file flume1.conf -Dflume.root.logger=INFO,console
# 所有的source启动后才能启动flume1,它监听其他的端口
test
worker213向44444端口发送数据
nc localhost 44444
4: Realize load balancing
Adjust flume1.conf to use the load balancing strategy - other processes and configurations remain unchanged
vim flume1.conf
# 1:定义组件
a1.sources = r1
a1.channels = c1
a1.sinks = k1 k2
# 2:定义source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# 3:定义channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 4:定义sink
a1.sinkgroups = g1 # 定义sink组g1
# 负载均衡 start
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = random
# 负载均衡 end
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = worker214
a1.sinks.k1.port = 4141
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = worker215
a1.sinks.k2.port = 4142
# 5:定义关联关系
a1.sources.r1.channels = c1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1
Flume series
Apache Hadoop Ecological Deployment - Flume Collection Node Installation
Flume series: Flume component architecture
Flume series: use of Flume Source
Flume series: use of Flume Channel
Flume series: use of Flume Sink
Flume series: Flume custom Interceptor interceptor
Flume series: Flume channel topology
Flume Series: Cases of Flume Common Acquisition Channels
Flume series: case-Flume replication (Replicating) and multiplexing (Multiplexing)
Flume series: case-Flume load balancing and failover
Flume series: case-Flume aggregation topology (common log collection structure)
Flume series: Flume data monitoring Ganglia