[Flume of big data] Five, Flume Advanced Custom Interceptor Interceptor

(1) Requirements:
  To use Flume to collect local server logs, it is necessary to send different types of logs to different analysis systems according to different log types.

(2) Analysis:
  There may be many types of logs generated by a server, and different types of logs may need to be sent to different analysis systems.
  At this time, the Multiplexing multiplexing structure in the Flume topology will be used. The principle of Multiplexing is to send different events to different Channels according to the value of a key in the Header in the event, so we need to customize a Interceptor assigns different values ​​to the key in the Header of different types of events.
  Simulate logs with port data, simulate different types of logs with whether "atguigu" is included, we need to customize the interceptor to distinguish whether "atguigu" is included in the data, and send them to different analysis systems (Channels).
insert image description here
Steps:
(1) Create a maven project and introduce the following dependencies.

<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-core</artifactId>
<version>1.9.0</version>
</dependency>

(2) Define the CustomInterceptor class and implement the Interceptor interface, and package it, and put the jar package in the /opt/module/flume-1.9.0/lib directory.

package com.study.interceptor;

import org.apache.flume.Context;import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;
import java.util.ArrayList;import java.util.List; import java.util.Map;

public class TypeInterceptor implements Interceptor {
    
    

    //声明一个存放事件的集合
    private List<Event> addHeaderEvents;
    
    @Override
    public void initialize() {
    
    
        //初始化存放事件的集合
        addHeaderEvents = new ArrayList<>();
    }
    
    //单个事件拦截 @Override
    public Event intercept(Event event) {
    
    
    
        //1.获取事件中的头信息
        Map<String, String> headers = event.getHeaders();
        
        //2.获取事件中的 body 信息
        String body = new String(event.getBody());
        
        //3.根据 body 中是否有"atguigu"来决定添加怎样的头信息
        if (body.contains("atguigu")) {
    
    
            //4.添加头信息 
            headers.put("type","first");
        } else {
    
    
            //4.添加头信息
            headers.put("type","second");
        }
    
        return event;
    }
    
    //批量事件拦截 @Override
    public List<Event> intercept(List<Event> events){
    
    
    
        //1.清空集合addHeaderEvents.clear();
        
        //2.遍历 events
        for (Event event : events) {
    
    
            //3.给每一个事件添加头信息 
            addHeaderEvents.add(intercept(event));
        }
        
        //4.返回结果
        return addHeaderEvents;
    }
    
    @Override
    public void close() {
    
    
    
    }
    public static class Builder implements Interceptor.Builder {
    
    @Override
         public Interceptor build() {
    
     
            return new TypeInterceptor();
        }
            
        @Override
        public void configure(Context context) {
    
    
            
        }
    }
}

(3) Edit the flume configuration file
  to configure a netcat source, a sink group (2 avro sinks) for flume1.conf under the /opt/module/flume-1.9.0/job/group4 directory on hadoop102, and
configure the corresponding The ChannelSelector and interceptor.

# Name the components on this agent 
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2

# Describe/configure the source 
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444 

#拦截器
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = com.study.interceptor.TypeInterceptor$Builder

#多路复用选择器
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = type
a1.sources.r1.selector.mapping.atguigu = c1
a1.sources.r1.selector.mapping.other = c2


# Describe the sink 
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop103
a1.sinks.k1.port = 4141

a1.sinks.k2.type=avro
a1.sinks.k2.hostname = hadoop104
a1.sinks.k2.port = 4242

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Use a channel which buffers events in memory
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

(4) Configure an avro source and a logger sink for flume2.conf under the /opt/module/flume-1.9.0/job/group4 directory on hadoop103.

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop103
a1.sources.r1.port = 4141

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sinks.k1.channel = c1
a1.sources.r1.channels = c1

(5) Configure an avro source and a logger sink for flume3.conf under the /opt/module/flume-1.9.0/job/group4 directory on hadoop104.

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop104
a1.sources.r1.port = 4242

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sinks.k1.channel = c1
a1.sources.r1.channels = c1

(6) Start the flume process on hadoop103, hadoop104, hadoop102 respectively, pay attention to the sequence.

(7) Use netcat on hadoop102 to send messages to localhost:44444, and observe the logs printed by hadoop103 and hadoop104.

Guess you like

Origin blog.csdn.net/qq_18625571/article/details/131783701