Introduction to Flink CEP Complex Event Processing in Flink

1. What is complex event processing CEP

One or more event streams composed of simple events are matched through certain rules, and then output the data that users want to meet the complex events of the rules.

feature:

Goal: discover some high-level features from the orderly stream of simple events
Input: one or more event streams composed of simple events
Processing: identify the internal connections between simple events, and multiple simple events that meet certain rules constitute complex events
Output: complex events that meet the rules

Insert picture description here
CEP 用于分析低延迟、频繁产生的不同来源的事件流. CEP can help find meaningful patterns and complex relationships in complex and unrelated event streams, so as to get notifications in near real-time or quasi-real-time and prevent some behaviors.

CEP supports pattern matching on the stream. According to the different conditions of the pattern, it is divided into continuous conditions or discontinuous conditions; the conditions of the pattern are allowed to have a time limit. When the conditions are not met within the condition range, it will cause the pattern The match timed out.

It seems simple, but it has many different functions:

(1) Input stream data, produce results as soon as possible
(2) Perform aggregate calculations based on time on 2 event streams
(3) Provide real-time/quasi-real-time warnings and notifications
(4) Generate in multiple data sources Correlation and analysis mode
(5) processing with high throughput and low latency

There are many CEP solutions on the market, such as Spark, Samza, Beam, etc., but none of them provide dedicated library support. But Flink 提供了专门的 CEP library...

Two, Flink CEP

Flink provides a special Flink CEP library for CEP, which contains the following components:

(1) Event Stream
(2) Pattern definition
(3) Pattern detection
(4) Generate Alert

Insert picture description here
First, the developer needs to define the mode conditions on the DataStream stream, and then the Flink CEP engine performs mode detection and generates alarms when necessary.

In order to use Flink CEP, we need to import dependencies:

<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-cep_${
    
    scala.binary.version}</artifactId>
<version>${
    
    flink.version}</version>
</dependency>

2.1 Event Streams

Take the login event flow as an example:

case class LoginEvent(userId: String, ip: String, eventType: String, eventTi
me: String)
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.setParallelism(1)

val loginEventStream = env.fromCollection(List(
LoginEvent("1", "192.168.0.1", "fail", "1558430842"),
LoginEvent("1", "192.168.0.2", "fail", "1558430843"),
LoginEvent("1", "192.168.0.3", "fail", "1558430844"),
LoginEvent("2", "192.168.10.10", "success", "1558430845")
)).assignAscendingTimestamps(_.eventTime.toLong)

2.2 Pattern API

Each Pattern should contain several steps, or state. From one state to another, we usually need to define some conditions, such as the following code:

val loginFailPattern = Pattern.begin[LoginEvent]("begin")
.where(_.eventType.equals("fail"))
.next("next")
.where(_.eventType.equals("fail"))
.within(Time.seconds(10)

Each state should have a label: for example , "begin" in the .begin LoginEvent . Each state needs to have a unique name, and it needs a filter to filter conditions. This filter condition defines the conditions that the event needs to meet, for example:

.where(_.eventType.equals("fail"))

We can also restrict the subtype of event by subtype:

start.subtype(SubEvent.class).where(...);

In fact, you can call the subtype and where methods multiple times; and if the where conditions are irrelevant, you can use or to specify a separate filter function:

pattern.where(...).or(...);

After that, we can switch to the next state through the next or followedBy method based on this condition. Next means the element immediately after the element that meets the condition in the previous step; and followedBy does not require that it must be the next element . These two are called strict neighbors and non-strict neighbors respectively.

val strictNext = start.next("middle")
val nonStrictNext = start.followedBy("middle")

Finally, we can limit all the Pattern conditions to a certain time range:

next.within(Time.seconds(10))

This time can be Processing Time or Event Time.

2.3 Pattern detection

Through an input DataStream and the Pattern we just defined, we can create a PatternStream:

val input = ...
val pattern = ...
val patternStream = CEP.pattern(input, pattern)
val patternStream = CEP.pattern(loginEventStream.keyBy(_.userId), loginFail
Pattern)

Once we get the PatternStream, we can find the warning information we need from a Map sequence through select or flatSelect.

2.4 select

The select method needs to implement a PatternSelectFunction to output the required warnings through the select method. It accepts a Map pair, including string/event, where the key is the name of the state, and the event is the actual Event.

val loginFailDataStream = patternStream
.select((pattern: Map[String, Iterable[LoginEvent]]) => {
    
    
val first = pattern.getOrElse("begin", null).iterator.next()
val second = pattern.getOrElse("next", null).iterator.next()
Warning(first.userId, first.eventTime, second.eventTime, "warning")
})

The return value is only 1 record.

2.5 flatSelect

By implementing PatternFlatSelectFunction, a function similar to select is achieved. The only difference is that the flatSelect method can return multiple records, and it passes the output data downstream through a Collector[OUT] type parameter.

2.6 Handling of timeout events

Through the within method, our parttern rule limits the matched events to a certain window range. When there is an event that arrives after the window time has passed, we can handle this situation by implementing PatternTimeoutFunction and PatternFlatTimeoutFunction in select or flatSelect.

val patternStream: PatternStream[Event] = CEP.pattern(input, pattern)
val outputTag = OutputTag[String]("side-output")
val result: SingleOutputStreamOperator[ComplexEvent] = patternStream.select
(outputTag){
    
    
(pattern: Map[String, Iterable[Event]], timestamp: Long) => TimeoutEvent
()
} {
    
    
pattern: Map[String, Iterable[Event]] => ComplexEvent()
}
val timeoutResult: DataStream = result.getSideOutput(outputTa
g)

Guess you like

Origin blog.csdn.net/weixin_43520450/article/details/108690161