Flink complex event processing CEP

Flink CEP

Flink's CEP (Complex Event Processing) refers to a library provided by Flink for processing complex event sequences. Complex events usually consist of multiple simple events that occur in a specific order within a specific time window. CEP can be used to detect and identify these complex events and operate and process them according to predefined patterns.

Flink's CEP library provides a flexible and powerful programming model that enables users to specify the relationship patterns between different events and define the conditions for event triggering. It is capable of handling complex event patterns based on timing, sequence, and other properties, and supports streaming and real-time data. CEP can be used to build event-based applications, such as financial transaction monitoring, network traffic analysis, IoT data processing, etc.

Application scenarios

Flink CEP (Complex Event Processing) is a technology for processing complex event patterns in data streams. It is suitable for a variety of real-time data processing scenarios, including:

金融交易监控:实时监控金融交易数据流,以识别潜在的欺诈行为,例如检测异常的交易序列或者异常的资金流动模式。

网络安全分析:对实时网络日志进行分析,以检测网络攻击、异常行为或者安全威胁,例如识别特定攻击模式或异常的网络通信序列。

物联网(IoT)数据处理:处理来自传感器和设备的实时数据,以识别设备故障、异常事件或者预测维护需求,例如发现特定的设备状态序列暗示了潜在的问题。

市场营销和个性化推荐:分析客户实时行为数据,识别特定的购买模式或者行为序列,以提供个性化的产品推荐或市场营销策略。

生产流程监控:监控工业生产线上的传感器和生产数据,以检测生产异常、预测设备故障或者优化生产调度。

医疗健康监控:实时监控病人健康数据或医疗设备数据,以检测潜在的健康危机、预测病情变化或者提供实时的健康监控服务。

Basic use

Add dependencies

Add Flink CEP dependencies to pom.xml

   <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${
    
    flink.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java</artifactId>
            <version>${
    
    flink.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients</artifactId>
            <version>${
    
    flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-cep</artifactId>
            <version>${
    
    flink.version}</version>
        </dependency>

Define matching pattern

DataStream events to which pattern matching is applied must implement the correct equals() and hash Code() methods, as Flink CEP uses them to compare and match events.

    public static void main(String[] args) throws Exception {
    
    
        // 设置执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        // 准备事件流
        DataStream<Tuple2<String, Integer>> inputEventStream = env.fromElements(
                        new Tuple2<>("event", 1),
                        new Tuple2<>("event", 2),
                        new Tuple2<>("event", 3),
                        new Tuple2<>("event", 4),
                        new Tuple2<>("event", 5),
                        new Tuple2<>("event", 6),
                        new Tuple2<>("event", 7),
                        new Tuple2<>("event", 8)
                ).assignTimestampsAndWatermarks(
                        WatermarkStrategy
                                .<Tuple2<String, Integer>>forMonotonousTimestamps()
                                .withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Integer>>() {
    
    
                                    @Override
                                    public long extractTimestamp(Tuple2<String, Integer> event, long recordTimestamp) {
    
    
                                        return event.f1 * 1000;
                                    }
                                })
                )
                .keyBy(event -> event.f0);

        /**
         * 定义复杂事件处理模式
         * 先匹配元素是偶数的事件,然后匹配元素>3的事件,然后继续匹配元素是8的元素
         */
        // 声明并初始化一个模式,用于表示要在事件流中检测的模式。这个模式匹配的是一个包含String和Integer类型元素的元组. begin("start") 来定义模式的起始点
        Pattern<Tuple2<String, Integer>, ?> pattern = Pattern.<Tuple2<String, Integer>>begin("start")
                // 对模式的起始点应用条件,条件是一个简单的过滤条件: 事件的第二个元素是偶数。
                .where(new SimpleCondition<Tuple2<String, Integer>>() {
    
    
                    @Override
                    public boolean filter(Tuple2<String, Integer> event) {
    
    
                        return event.f1 % 2 == 0;
                    }
                })
                // 定义了紧接在起始点后的第一个元素命名为 "middle"
                .next("middle")
                // 这个元素是一个Tuple2类型的子类型
                .subtype(Tuple2.class)
                // 对第二个元素应用了迭代条件,这里使用了一个迭代条件(IterativeCondition),来检查第二个元素是否为奇数
                .where(new SimpleCondition<Tuple2>() {
    
    
                    @Override
                    public boolean filter(Tuple2 event) {
    
    
                        // return (Integer) event.f1 > 5;
                        return (Integer) event.f1 > 3;
                    }
                })
                // 规定了前面定义的模式必须发生N次
                .times(2)
                // 定义了这N次发生必须是连续的
                .consecutive()
                // 定义了在之后紧跟的元素命名为 "end",用于表示模式的结束
                .followedBy("end")
                // 对模式的结束点应用了一个简单的条件,确保事件的第二个元素等于8
                .where(SimpleCondition.of((Tuple2<String, Integer> event) -> event.f1 == 8)).within(Time.seconds(5));

        // 在事件流上应用模式
        PatternStream<Tuple2<String, Integer>> patternStream = CEP.pattern(inputEventStream.keyBy(event -> event.f0), pattern);
        // 选择匹配结果并输出
        // DataStream<String> result = patternStream.select(new MyPatternSelectFunction());
        DataStream<String> result = patternStream.process(new MyPatternProcessFunction());

        result.print();

        // 执行任务
        env.execute("CEP Example");
    }

Define matching results

 /**
     * PatternSelectFunction定义匹配结果的处理函数
     */
    public static class MyPatternSelectFunction implements PatternSelectFunction<Tuple2<String, Integer>, String> {
    
    
        @Override
        public String select(Map<String, List<Tuple2<String, Integer>>> pattern) throws Exception {
    
    
            StringBuilder builder = new StringBuilder();

            builder.append("找到匹配项: ");
            pattern.forEach((key, value) -> builder.append(key).append(" => ").append(value).append("; "));

            return builder.toString();
        }
    }

    /**
     * PatternProcessFunction定义匹配结果的处理函数
     */
    public static class MyPatternProcessFunction extends PatternProcessFunction<Tuple2<String, Integer>, String> {
    
    
        @Override
        public void processMatch(Map<String, List<Tuple2<String, Integer>>> pattern, Context context, Collector<String> collector) throws Exception {
    
    
            StringBuilder builder = new StringBuilder();

            builder.append("找到匹配项: ");
            pattern.forEach((key, value) -> builder.append(key).append(" => ").append(value).append("; "));

            collector.collect(builder.toString());
        }
    }

verify

1> 找到匹配项: start => [(event,4)]; middle => [(event,5), (event,6)]; end => [(event,8)]; 

Pattern API

The Pattern API allows the definition of complex pattern sequences to be extracted from the input stream

每个复杂模式序列由多个简单模式组成,即寻找具有相同属性的单个事件的模式

每个模式必须有一个唯一的名称,可以使用该名称来标识匹配的事件

模式名称不能包含字符":"

single mode

Each individual pattern definition in a complex rule is an individual pattern. We can either define the number of times a given event occurs (quantifier), or we can define a condition that determines whether an incoming event is accepted into the pattern (condition).

quantifier

By default, the pattern is a singleton pattern, which can be converted into a loop pattern using quantifiers.

API illustrate
pattern.oneOrMore() Pattern occurs 1 or N times
pattern.times(#ofTimes) a pattern that occurs one or more times
pattern.times(#fromTimes, #toTimes) Patterns that occur a specific number of times
pattern.greedy() Patterns become greedy, the more matches the better
pattern.optional() Patterns may not match

Usage example:

// 期望出现4次
pattern.times(4);

// 期望出现0次或者4次
pattern.times(4).optional();

// 期望出现2次、3次或者4次
pattern.times(2, 4);

// 期望出现2次、3次或者4次,尽可能多地重复
pattern.times(2, 4).greedy();

// 期望出现0次、2次、3次或者4次
pattern.times(2, 4).optional();

// 期望出现0次、2次、3次或者4次,尽可能多地重复
pattern.times(2, 4).optional().greedy();

// 期望出现1次或者更多次
pattern.oneOrMore();

// 期望出现1次或者更多次,尽可能多地重复
pattern.oneOrMore().greedy();

// 期望出现0次或者更多次
pattern.oneOrMore().optional();

// 期望出现0次或者更多次,尽可能多地重复
pattern.oneOrMore().optional().greedy();

// 期望出现2次或者更多次
pattern.timesOrMore(2);

// 期望出现2次或者更多次,尽可能多地重复
pattern.timesOrMore(2).greedy();

// 期望出现0次、2次或者更多次
pattern.timesOrMore(2).optional()

// 期望出现0次、2次或者更多次,尽可能多地重复
pattern.timesOrMore(2).optional().greedy();

condition

For each pattern, you can specify the conditions that incoming events must meet in order to be "accepted" into the pattern

API describe Example illustrate
pattern.where() Defines the conditions for the current mode. In order to match the pattern, the event must satisfy the condition. Multiple consecutive where() clauses will cause their conditions to be ANDed pattern.where(SimpleCondition.of((Tuple2<String, Integer> event) -> event.f1 == 8)) Match f1==8
pattern.or() Add a new condition OR combined with an existing condition. An event can match a pattern only if it meets at least one of these conditions pattern.where(SimpleCondition.of((Tuple2<String, Integer> event) -> event.f1 ==1)).or(SimpleCondition.of((Tuple2<String, Integer> event) -> event.f1 == 2)) Match f1 == 1 or f1==2
pattern.until() Specify the stop condition for loop mode. If an event occurs that matches the given criteria, the pattern will no longer accept any more events. Only works in conjunction with oneOrMore() NOTE: It allows clearing the state of the corresponding mode on event-based conditions pattern.until(SimpleCondition.of((Tuple2<String, Integer> event) -> event.f1 == 2)) Match 1 or more times until f1==2

If the name starts with foo, the next event of the pattern named middle is accepted, and if the sum of the prices of previously accepted events by the pattern plus the price of the current event does not exceed a value of 5.0

middle.oneOrMore()
    .subtype(SubEvent.class)
    .where(new IterativeCondition<SubEvent>() {
    
    
        @Override
        public boolean filter(SubEvent value, Context<SubEvent> ctx) throws Exception {
    
    
            if (!value.getName().startsWith("foo")) {
    
    
                return false;
            }
    
            double sum = value.getPrice();
            for (Event event : ctx.getEventsForPattern("middle")) {
    
    
                sum += event.getPrice();
            }
            return Double.compare(sum, 5.0) < 0;
        }
    });

Combination mode

Combining many individual patterns forms a combined pattern. Flink CEP supports the following forms of continuity strategies between events:

严格连续性:期望所有匹配事件严格一个接一个地出现,中间没有任何不匹配的事件。

宽松连续性:忽略匹配事件之间出现的不匹配事件。

非确定性宽松连续性:进一步放松连续性,允许忽略某些匹配事件的其他匹配。

To apply them between consecutive patterns, you can use:

next():对于严格的
followedBy():对于宽松的
followedByAny():对于非确定性松弛连续性
notNext():如果您不希望某个事件类型直接跟随另一个事件类型
notFollowedBy():如果您不希望某个事件类型介于其他两个事件类型之间

The pattern sequence must start with the initial pattern

Pattern<Event, ?> start = Pattern.<Event>begin("start");

Pattern<Event, ?> start = Pattern.<Event>begin(
    Pattern.<Event>begin("start").where(...).followedBy("middle").where(...)
);
API illustrate Example
begin(#name) Define starting mode Pattern<Event, ?> start = Pattern.begin(“start”);
begin(#pattern_sequence) Define starting mode Pattern.begin(Pattern.begin(“start”).where(…).followedBy(“middle”).where(…));
next(#name) Added new mode. Matching events must directly inherit the previous matching event (strict continuity) Pattern<Event, ?> next = start.next(“middle”)
next(#pattern_sequence) Added new mode. A series of matching events must directly follow the previous matching event (strict continuity) start.next(Pattern.begin(“start”).where(…).followedBy(“middle”).where(…));
followedBy(#name) Added new mode. Other events can occur between the matching event and the previous matching event (relaxed continuity) Pattern<Event, ?> followedBy = start.followedBy(“middle”);
followedBy(#pattern_sequence) Added new mode. Other events can occur between the matching event and the previous matching event (relaxed continuity) start.followedBy(Pattern.begin(“start”).where(…).followedBy(“middle”).where(…));
followedByAny(#name) Added new mode. Other events can occur between the match event and the previous match event, and alternative matches will be rendered for each alternative match event (non-deterministic relaxed continuity) Pattern<Event, ?> followedByAny = start.followedByAny(“middle”);
followedByAny(#pattern_sequence) Added new mode. Other events can occur between the match event and the previous match event, and alternative matches will be rendered for each alternative match event (non-deterministic relaxed continuity) start.next(Pattern.begin(“start”).where(…).followedBy(“middle”).where(…));
notNext() Added new negation pattern. Match (negation) events must directly inherit the previous match event (strict continuity) in order to discard partial matches Pattern<Event, ?> notNext = start.notNext(“not”);
notFollowedBy() Added new negation pattern. Partially matched event sequences are discarded even if other events occur between the matching (negative) event and the previous matching event (relaxed continuity) Pattern<Event, ?> notFollowedBy = start.notFollowedBy(“not”);
within(time) Defines the maximum time interval between sequence of events and pattern matching. If an unfinished sequence of events exceeds this time, it is discarded pattern.within(Time.seconds(10));

Usage example:

// 严格的连续性模式
Pattern<Event, ?> strict = start.next("middle").where(...);

// 宽松的连续性模式
Pattern<Event, ?> relaxed = start.followedBy("middle").where(...);

// 非确定性的宽松连续性模式
Pattern<Event, ?> nonDetermin = start.followedByAny("middle").where(...);

// 使用严格连续性的NOT模式
Pattern<Event, ?> strictNot = start.notNext("not").where(...);

// 使用宽松连续性的NOT模式
Pattern<Event, ?> relaxedNot = start.notFollowedBy("not").where(...);

Skip strategy

For a given pattern, the same event can be assigned to multiple successful matches. To control how many matches an event will be assigned, specify the skip strategy AfterMatchSkipStrategy

There are five types of jumping strategies

API illustrate
AfterMatchSkipStrategy.noSkip() Create NO_SKIP skip policy
AfterMatchSkipStrategy.skipToNext() Create SKIP_TO_NEXT skip policy
AfterMatchSkipStrategy.skipPastLastEvent() 创建SKIP_PAST_LAST_EVENT跳过策略
AfterMatchSkipStrategy.skipToFirst(patternName) 使用引用的模式名称patternName创建SKIP_TO_FIRST跳过策略
AfterMatchSkipStrategy.skipToLast(patternName) 使用引用的模式名称patternName创建SKIP_TO_LAST跳过策略

注意:

当使用SKIP_TO_FIRST和SKIP_TO_LAST跳过策略时,还应指定有效的PatternName

SkipToFirstStrategy skipToFirstStrategy = AfterMatchSkipStrategy.skipToFirst("patternName");
Pattern.begin("patternName", skipToFirstStrategy);

模式组

将一个模式作为条件嵌套在单个模式里,就是模式组。

Pattern<Event, ?> start = Pattern.begin(
Pattern.begin("start").where(...).followedBy("start_middle").where(...)
);

// 严格的连续性模式
Pattern<Event, ?> strict = start.next(
Pattern.begin("next_start").where(...).followedBy("next_middle").where(...)
).times(3);

// 宽松的连续性模式
Pattern<Event, ?> relaxed = start.followedBy(
Pattern.begin("followedby_start").where(...).followedBy("followedby_middle").where(...)
).oneOrMore();

// 非确定性的宽松连续性模式
Pattern<Event, ?> nonDetermin = start.followedByAny(
Pattern.begin("followedbyany_start").where(...).followedBy("followedbyany_middle").where(...)
).optional();

匹配结果

指定要查找的模式序列后,就可以将其应用到输入流以检测潜在的匹配项。

要针对模式序列运行事件流,必须创建一个PatternStream. 给定一个输入流input、一个模式pattern和一个可选的比较器,comparator用于对具有相同时间戳的事件(在 EventTime的情况下或在同一时刻到达)进行排序

可以使用PatternProcessFunction、也可以使用旧式API,例如PatternSelectFunction

1.PatternProcessFunction

PatternProcessFunction有一个processMatch为每个匹配事件序列调用的方法。

PatternStream<Event> patternStream = CEP.pattern(input, pattern, comparator);
    public static class MyPatternProcessFunction extends PatternProcessFunction<Tuple2<String, Integer>, String> {
    
    
        /**
         *
         * @param pattern Map<String, List<IN>>其中键是模式序列中每个模式的名称,值是该模式的所有已接受事件的列表(IN是输入元素的类型)
         */
        @Override
        public void processMatch(Map<String, List<Tuple2<String, Integer>>> pattern, Context context, Collector<String> collector) throws Exception {
    
    
            StringBuilder builder = new StringBuilder();

            builder.append("找到匹配项: ");
            pattern.forEach((key, value) -> builder.append(key).append(" => ").append(value).append("; "));

            collector.collect(builder.toString());
        }
    }

使用:

DataStream<String> result = patternStream.process(new MyPatternProcessFunction());

2.TimedOutPartialMatchHandler

每当模式具有通过within关键字附加的窗口长度时,部分事件序列就有可能被丢弃,因为它们超出了窗口长度。要对超时的部分匹配采取行动,可以使用TimedOutPartialMatchHandler接口。

TimedOutPartialMatchHandler提供了额外的processTimedOutMatch方法,每次超时的部分匹配都会调用该方法。

    public static class MyPatternProcessFunction extends PatternProcessFunction<Tuple2<String, Integer>, String> implements TimedOutPartialMatchHandler<Tuple2<String, Integer>> {
    
    
        /**
         *
         * @param pattern Map<String, List<IN>>其中键是模式序列中每个模式的名称,值是该模式的所有已接受事件的列表(IN是输入元素的类型)
         */
        @Override
        public void processMatch(Map<String, List<Tuple2<String, Integer>>> pattern, Context context, Collector<String> collector) throws Exception {
    
    
            StringBuilder builder = new StringBuilder();

            builder.append("找到匹配项: ");
            pattern.forEach((key, value) -> builder.append(key).append(" => ").append(value).append("; "));

            collector.collect(builder.toString());
        }

        @Override
        public void processTimedOutMatch(Map<String, List<Tuple2<String, Integer>>> map, Context context) throws Exception {
    
    
            StringBuilder builder = new StringBuilder();
            builder.append("处理超时的部分模式: ");
            map.forEach((key, value) -> builder.append(key).append(" => ").append(value).append("; "));

            System.out.println(builder.toString());
        }
    }
  1. PatternSelectFunction
public static class MyPatternSelectFunction implements PatternSelectFunction<Tuple2<String, Integer>, String> {
    
    
        @Override
        public String select(Map<String, List<Tuple2<String, Integer>>> pattern) throws Exception {
    
    
            StringBuilder builder = new StringBuilder();

            builder.append("找到匹配项: ");
            pattern.forEach((key, value) -> builder.append(key).append(" => ").append(value).append("; "));

            return builder.toString();
        }
    }

使用:

DataStream<String> result = patternStream.select(new MyPatternSelectFunction());

应用示例

模拟查找匹配5秒钟内连续登录失败在3次以上的用户

自定义消息事件

@Data
@AllArgsConstructor
@NoArgsConstructor
public class LoginEvent {
    
    
    /**
     * 用户id
     */
    private Integer uid;
    /**
     * 是否登录成功
     */
    private Boolean success;
    /**
     * 时间戳
     */
    private Long timeStamp; 
}

自定义Pattern

 public static void main(String[] args) throws Exception {
    
    
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        DataStream<LoginEvent> streamSource = env
                .fromElements(
                        new LoginEvent(1, false, 1000L),
                        new LoginEvent(2, true, 2000L),
                        new LoginEvent(3, true, 3000L),
                        new LoginEvent(1, false, 4000L),
                        new LoginEvent(1, false, 5000L),
                        new LoginEvent(4, false, 5000L)
                )
                .assignTimestampsAndWatermarks(
                        WatermarkStrategy
                                .<LoginEvent>forMonotonousTimestamps()
                                .withTimestampAssigner(new SerializableTimestampAssigner<LoginEvent>() {
    
    
                                    @Override
                                    public long extractTimestamp(LoginEvent loginEvent, long l) {
    
    
                                        return loginEvent.getTimeStamp();
                                    }
                                })
                )
                .keyBy(r -> r.getUid());

        Pattern<LoginEvent, LoginEvent> pattern = Pattern
                .<LoginEvent>begin("first")
                .where(new SimpleCondition<LoginEvent>() {
    
    
                    @Override
                    public boolean filter(LoginEvent loginEvent) throws Exception {
    
    
                        return !loginEvent.getSuccess();
                    }
                })
                .next("second")
                .where(new SimpleCondition<LoginEvent>() {
    
    
                    @Override
                    public boolean filter(LoginEvent loginEvent) throws Exception {
    
    
                        return !loginEvent.getSuccess();
                    }
                })
                .next("third")
                .where(new SimpleCondition<LoginEvent>() {
    
    
                    @Override
                    public boolean filter(LoginEvent loginEvent) throws Exception {
    
    
                        return !loginEvent.getSuccess();
                    }
                })
                .within(Time.seconds(5));


        PatternStream<LoginEvent> patternedStream = CEP.pattern(streamSource, pattern);

        patternedStream.select(new PatternSelectFunction<LoginEvent, String>() {
    
    
                    @Override
                    public String select(Map<String, List<LoginEvent>> map) throws Exception {
    
    
                        LoginEvent first = map.get("first").iterator().next();
                        LoginEvent second = map.get("second").iterator().next();
                        LoginEvent third = map.get("third").iterator().next();
                        return String.format("uid:%d 连续3次登录失败,登录时间: first:%d, second:%d, third:%d", first.getUid(), first.getTimeStamp(), second.getTimeStamp(), third.getTimeStamp());
                    }
                })
                .print();

        env.execute();
    }

测试

uid:1 5秒钟内连续3次登录失败,登录时间: first:1000, second:4000, third:5000

Guess you like

Origin blog.csdn.net/qq_38628046/article/details/134335304