Flume series: case-Flume load balancing and failover

Table of contents

Apache Hadoop Ecology - Directory Summary - Continuous Update

logic:

 2: Case requirements - implement failover

3: Implementation steps:

2.1: Implement flume1.conf

2.2: Implement flume2.conf - port 4141

2.3: Implement flume3.conf - port 4142

3: Start the transmission link

4: Realize load balancing


Apache Hadoop Ecology - Directory Summary - Continuous Update

System environment: centos7

Java environment: Java8

The case only demonstrates the channel process, where the types of Source, channel, and Sink can be adjusted as needed

Choose one of failover configuration and load balancing, and cannot be configured at the same time

logic:

Flume supports the use of logically dividing multiple sinks into a sink group, and the sink group cooperates with different SinkProcessors to achieve load balancing and error recovery.

Load balancing: Multiple tasks are fetched in turn

Failover: Devices with high priority take task failover

 2: Case requirements - implement failover

Use Flume1 to monitor a port, and the sinks in the sink group are respectively connected to Flume2 and Flume3, and FailoverSinkProcessor is used to realize the function of failover

Architecture process: Flume1 sends to Flume2 and Flume3 at the same time

Failover: devices with high priority take tasks

3: Implementation steps:

2.1: Implement flume1.conf

Configure 1 netcat source, 1 channel, and 1 sink group (2 sinks), which are respectively sent to flume2 port 4141 and flume3 port 4142

vim flume1.conf

# 1:定义组件
a1.sources = r1
a1.channels = c1
a1.sinks = k1 k2

# 2:定义source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# 3:定义channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100


# 4:定义sink
a1.sinkgroups = g1  # 定义sink组g1
# 故障转移配置 start
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10
a1.sinkgroups.g1.processor.maxpenalty = 10000
# 故障转移配置 end

a1.sinks.k1.type = avro
a1.sinks.k1.hostname = worker214
a1.sinks.k1.port = 4141

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = worker215
a1.sinks.k2.port = 4142

# 5:定义关联关系
a1.sources.r1.channels = c1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

2.2: Implement flume2.conf - port 4141

source port 4141

vim flume2.conf

# 1:定义组件
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# 2:定义source
a2.sources.r1.type = avro
a2.sources.r1.bind = worker214
a2.sources.r1.port = 4141

# 3:定义channel
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# 4:定义sink
a2.sinks.k1.type = logger

# 5:定义关联关系
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

2.3: Implement flume3.conf - port 4142

source port 4142

vim flume3.conf

# 1:定义组件
a3.sources = r1
a3.sinks = k1
a3.channels = c2

# 2:定义source
a3.sources.r1.type = avro
a3.sources.r1.bind = worker215
a3.sources.r1.port = 4142

# 3:定义channel
a3.channels.c2.type = memory
a3.channels.c2.capacity = 1000
a3.channels.c2.transactionCapacity = 100

# 4:定义sink
a3.sinks.k1.type = logger

# 5:定义关联关系
a3.sources.r1.channels = c2
a3.sinks.k1.channel = c2

3: Start the transmission link

1)启动flume2,监听端口4141
flume-ng agent --name flume2 --conf-file flume2.conf  -Dflume.root.logger=INFO,console

2)启动flume3,监听端口4142
flume-ng agent --name flume3 --conf-file flume3.conf  -Dflume.root.logger=INFO,console

3)启动flume1,sink到flume2 端口4141,到flume3 端口4142
flume-ng agent --name flume1 --conf-file flume1.conf  -Dflume.root.logger=INFO,console
# 所有的source启动后才能启动flume1,它监听其他的端口

test

worker213向44444端口发送数据
nc localhost 44444

4: Realize load balancing

Adjust flume1.conf to use the load balancing strategy - other processes and configurations remain unchanged

vim flume1.conf

# 1:定义组件
a1.sources = r1
a1.channels = c1
a1.sinks = k1 k2

# 2:定义source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# 3:定义channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100


# 4:定义sink
a1.sinkgroups = g1  # 定义sink组g1
# 负载均衡 start
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = random
# 负载均衡 end

a1.sinks.k1.type = avro
a1.sinks.k1.hostname = worker214
a1.sinks.k1.port = 4141

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = worker215
a1.sinks.k2.port = 4142

# 5:定义关联关系
a1.sources.r1.channels = c1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

 Flume series

        Apache Hadoop Ecological Deployment - Flume Collection Node Installation

        Flume series: Flume component architecture

        Flume series: use of Flume Source

        Flume series: use of Flume Channel

        Flume series: use of Flume Sink

        Flume series: Flume custom Interceptor interceptor

        Flume series: Flume channel topology

        Flume Series: Cases of Flume Common Acquisition Channels

        Flume series: case-Flume replication (Replicating) and multiplexing (Multiplexing)

        Flume series: case-Flume load balancing and failover

        Flume series: case-Flume aggregation topology (common log collection structure)

        Flume series: Flume data monitoring Ganglia

 

Guess you like

Origin blog.csdn.net/web_snail/article/details/130918650