Flink Flink big data calculation engine of CEP Complex Event Programming

Original Address: big data computing Flink Flink CEP engine of complex event programming
complex event program (CEP) is a technology-based stream processing, the system data as different types of events, by analyzing the relationship between events, create different event-based sequence database, and use filtering, correlation, aggregation techniques, ultimately produce simple event-level events, and track and analyze important information by way of a regular pattern, discover valuable information from real-time data center. Complex event processing is mainly used to prevent online fraud, device failure detection, risk aversion and smart marketing and other fields. Current mainstream CEP tool has Esper, Jboss Drools and night-shift MicroSoft StreamInsight, etc., Flink provides FlinkCEP component stack-based DataStream API, designed to handle complex events to help users discover valuable information from the streaming data.

Basic concepts

FlinkCEP Description

Event consisting of one or more primitive events by a certain stream matching rules, and then outputs the data desired by the user, satisfies rules complex event. It has the following features:

  • Goal: discover the simple and orderly flow of events in some of the higher-order features
  • Input: one or more events consisting of simple flow of events
  • Treatment: identify the intrinsic link between the simple event, a number of simple events that meet certain rules of a complex event
  • Output: meet complex event rules

flunk-mobile

CEP for low-latency analysis, different sources frequently generated event stream. CEP can help identify meaningful patterns and complex relationships in a complex, unrelated event stream to near real-time or near real-time notified and prevent some of the acts.

CEP supports pattern matching on the flow, depending on the mode conditions, the conditions are divided into continuous or discontinuous condition; condition mode allows time limit, when the condition is not achieved within the parameters, causes schema matching timeout.

CEP for low-latency analysis, different sources frequently generated event stream. CEP can help identify meaningful patterns and complex relationships in a complex, unrelated event stream to near real-time or near real-time notified and prevent some of the acts.

Preparing the Environment

Here, we need to introduce related libraries.

      <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-cep-scala_${scala.binary.version}</artifactId>
        <version>1.9.0</version>
      </dependency>

basic concepts

Event Definition

Simple Event

Simple event exists in the real scene, to deal with the main features of a single event, define the event can be observed directly, without paying attention to the relationship process between multiple events, the results can be calculated by simple data processing means.
Complex event

with respect to a simple event, not only a single event complex event processing, but also complex event processing complex event processing consisting of a plurality of event monitoring and analysis event streams (Event Streaming), when a specific event occurs to trigger certain actions.

Events relations

Complex inter-event event included a variety of relationships, a common timing relationships , aggregation relationship , hierarchical relationships , dependencies , and cause and effect relationship .
The timing relationship

between action and event action events, action events and status changes between events, there chronological order. Timing relationship determines the events and event timing majority rule, for example: A while continuing to event status of an event status B is 0 and the like;
aggregation relationship

between the event and the action event actions, events and status exists between the state event aggregation relationship, i.e., the individual set of polymerization integrally formed. For example: A is the number 1 event status was 10 triggered warning.
Hierarchical relationships

between events and actions action events, there is the relationship between state-level events and status events, namely the parent class events and subclass hierarchy of events, from parent to child class is embodied, from the child to the parent category class is the generalization. This can be compared to Java inheritance inside.
Dependencies

each other things dependencies between attributes and constraints state, for example, conditions A precondition is a B trigger event status event trigger is formed dependencies between AB.
Causation

for the full course of action, the result of state for the fruit, the initialization state and action can be seen as a result.

Event Processing

The purpose complex event processing is performed by the corresponding responsible for real-time data processing strategy overwhelmed. These strategies include the inference, check applications due, decision-making, prediction.
Event inferred

constraints between the main use of the transaction state, from a part of the status attribute value can be inferred status attribute value of the other portion. For chestnut: 1,1,2,3,5,8 ......, we can infer that the latter are: 13, 21 ......
incident investigation because

when there is the result of state, and you know early the state of knowledge, it is possible to identify the cause of an action; likewise, know the results, know the process, you can identify the cause of the initial state. This is equivalent to: f (x) = kx + b, know that f (x), know kx + b, then we know that x.
Event the decision

want to get a result of state, knowing the initial state, decide what action to take. The process and rules engine similar, for example, after starting operations in line with the conditions of a rule, and then perform alarm operation.
Event prediction

that case know the initial state of the event, and the action will do, predictions state occur in the future. For example: weather forecast.

Pattern API

Providing FlinkCEP Pattern API event rules used to define complex data input streams, and extracts an event from the event stream results.

flink-cep-core

Each Pattern should include several steps, otherwise known state. From one state to another state for example:

    Pattern.begin[LoginEvent]("begin")
    .where(_.eventType.equals("fail"))
    .next("next")
    .where(_.eventType.equals("fail"))
    .within(Time.seconds(5))

    // 或者如下
    Pattern.begin[Event]("start")
    .where(_.typeEvent.equals("temperature"))
    .next("middle")
    .subtype(classOf[TempEvent])
    .where(_.temp > 35.0)
    .followedBy("end")
    .where(_.name.equals("end"))

Description:

  1. Every state should have a logo, for example: begin[LoginEvent]("begin")where the "begin" and begin[Event]("start")here is "start".
  2. Each state needs to have a unique name, but need a filter to filter condition, this filter conditions need to define the event conditions are met. For example:.where(_.eventType.equals("fail"))
  3. We can also restrict the subtype Event by subtype, such as:.subtype(classOf[TempEvent])
  4. In fact, you can call the method multiple times subtype and where; and if where conditions are not relevant, you can specify a single filter function by or:pattern.where(...).or(...);
  5. After that, we can on the basis of this condition, a switch to the next by means of the next State or follow edBy is to say immediately after the step elements eligible element; followedBy it need not be the next element . Both are called non-strict and rigorous neighbors neighbors.
  6. Finally, we can define conditions for all of the Pattern within a certain time frame:within(Time.seconds(5))
  7. Time may be Processing Time, it can also be Event Time.

Pattern Detection

    val patternStream: PatternStream[LoginEvent] = CEP.pattern(loginEventSource, pattern)
    patternStream.select(loginEventSource.keyBy(_.userID) , loginfailPattern)

Once PatternStream, we can find the warning information we need from a sequence by select Map or flatSelect.

select

select methods need to implement a PatternSelectFunction, need to output a warning by the select method. It accepts a Map pairs that contain the string / event, where key is the name of the state, event was real Event.

    val loginfailPattern = patternStream.select(
    (pattern: Map[String, Iterable[LoginEvent]]) => {
      val first = pattern.getOrElse("begin", null).iterator.next()
      val second = pattern.getOrElse("next", null).iterator.next()

      Warning(first.userID, first.eventTime, second.eventTime, "warning")
    })

The return value is only one record.

flatSelect

by implementing PatternFlatSelectFunction, select and implement a similar function. The only difference is that the method may return flatSelect plurality of records, which data is transmitted through a Collector [OUT] Type of parameter to be output to the downstream.
Processing timeout event

event by methods within our parttern rule will match limited within a certain range of the window. When the event arrives after the time window over there, we can handle this situation by select or flatSelect, the realization PatternTimeoutFunction and PatternFlatTimeoutF unction.

 val out: OutputTag[String] = OutputTag[String]("side-output")
    patternStream.select(out){
      (pattern:Map[String,Iterable[Event]],timestamp:Long)=>{
        TimeoutEvent()
      }{
          (pattern:Map[String,Iterable[Event]],timestamp:Long)=>{
            ComplexEvent()
          }
        }
    }

After a rough reading, we FlinkCEP programming is basically Jiang Zi. That little detail now to say something.

Schema Definition

Pattern can be a single individual execution mode, it can also be a cycle execution mode. Single execution mode accepts only one event loop mode can accept multiple events. Typically, you can specify the number of cycles per execution cycle mode to the execution mode. Each mode can be applied to a plurality of combinations of conditions over the same event, a combination of conditions can be superimposed by a method where.

Pattern is defined by the individual begin method, for example, based on the event type of Event Pattern by Pattern.begin method, wherein PatternName the specified object.

val start = Pattern.begin[Event]("start_pattern")

Next () method specified in Condition Pattern by Pattern.where, only if Condition meet after the current Pattern will receive the event.

start.where(_.typeEvent.equals("temperature"))

Specifies the number of cycles

For already created Pattern, you can specify the number of cycles to form Pattern loop is executed, and there are three ways to specify a round-robin fashion.

  • times: a fixed number of cycles performed can be specified times by
// 指定2循环触发 4 次
start.times(4)
// 可以指定循环次数范围
start.times(2 , 4)
  • optional: can not trigger either through optional keyword specifies either a specified number of triggers
// 指定2循环触发 4 次
start.times(4).optional()
// 可以指定循环次数范围
start.times(2 , 4).optional()
  • greedy: greedy premise can be labeled as greedy Pattern mode, Pattern matching is successful, it will trigger as much as possible
// 触发 2,3,4 次,尽可能重复执行
start.times(2 , 4).optional()
// 触发 0,2,3,4 次,尽可能重复执行
start.times(2 , 4).optional().greedy()
  • oneOrMore: it can specify one or more times by oneOrMore trigger method
// 触发一次或者多次
start.oneOrMore()
// 触发一次或者多次,尽可能重复执行
start.oneOrMore().greedy()
// 触发 0 次或者 多次
start.oneOrMore().optional()
// 触发 0 次或者 多次 , 尽可能多次执行
start.oneOrMore().optional().greedy()
  • timesOrMore: the trigger can be specified by the method above timesOrMore fixed number, for example, performs two or more:
// 触发两次或者多次
start.timesOrMore(2)
// 触发两次或者多次,尽可能多次重复执行
start.timesOrMore(2).greedy()

Mode conditions

Each pattern needs to be specified trigger condition, as the time enters the mode judgment basis whether to accept, when the value of the time condition is satisfied, then the next step. In FlinkCEP to specify conditions for Pattern, Pattern and conditions Iterative Conditions, Simple Conditions and Type III Combining Conditions by patter.where (), pattern.or (), and patter.until () method.

Iterative conditions

Iterative Conditions can be in front of all the received event handling model, a set of statistical index is calculated according to the received event, and as this pattern matching the condition input parameters. Such as:

 .oneOrMore
      .subtype(classOf[TempEvent])
      .where(
        (value , ctx) => {
            // the condition for you
        }
      )

By subtype events into the Event TempEvent event, and obtaining "middle" mode where conditions by using ctx.getEventsForPattern (...) method to give Event recording all received, and based on these Event data were averaged temperature then determines whether the current event is less than the average temperature, and then determines whether the temperature is less than the average current event.

Simple conditions

Simple Condition inherited Iternative Condition class, the main judge of the event based on field information, decide whether to accept the event. as follows:

start.where(event=>event.enevtType.equals("temperature"))

Similarly, we can subclass type conversion to events by subtype, then the conditions for the subclass definition mode where methods.

Combined conditions

combinations that will simply merge condition, general methods may be used where a combination of conditions, the default condition of each connected by AND logic. If you need to use the OR logic, such as:

pattern.where(event => event.name.startWith("foo").or(event => enevt.eventType.equals("temperature")))

Termination conditions

if the procedures used in the oneOrMore or oneOrMore () optional () method, you must specify the termination condition, otherwise the mode of the rules would have been the cycle continues, such as:

patern.oneOrMore().until(event => event.name.equals("end"))

Note: In the above conditions iteration by calling ctx.getEventsForPattern ( "middle")

Sequence mode

The mutually exclusive patterns combined sequence pattern is then formed. The basic pattern sequence written in a way consistent and independent mode, each mode can be connected through the proximity condition. Wherein adjacent strict, loose near, near the three kinds of non-deterministic loose near the connection conditions, as follows:

val start : Pattern[]

Nearby strict

strict neighbor condition, you need to meet all of the events are in accordance with the conditions mode is not allowed to ignore any unsatisfied mode. As follows: using a method to specify the next Pattern after start Pattern, Pattern generated strictly adjacent.

val strict : Pattern[Event,_] = start.next("middle").where(...)

Nearby loose

in the loose neighbor condition, without success match mode ignores conditions will not be as strict proximity requirements so high, it can be simply understood as a logical OR relationship. as follows:

val strict : Pattern[Event,_] = start.followeBy("middle").where(...)

Non-deterministic loose near

和宽松邻近条件相比,非确定宽松邻近条件指在模式匹配过程中可以忽略已经匹配的条件。如下:

​```scala
val nonDetermin : Pattern[Event,_] = start.followerByAny("middle").where(....)

In addition to the above-described conditions, Flink also provides notNext (), notFollowerBy () other link criteria. notNext () that do not want to make a pattern with the pattern does not occur after another; notFollowerBy () do not want to emphasize a certain trigger mode in the trigger between the two modes.

Precautions: not end with a sequence of patterns notFollowerBy (), and not the type of mode can not be used simultaneously, and optional keywords

Mode group

Group as a model pattern sequence may begin, followerBy, floowerByAny next and other input parameters to form the connection conditions. GroupPattern cycle may be specified on conditions oneOrMore, times, optional, etc., used in the pattern sequence GroupPattern, each complete pattern sequence matching its own internal conditions, and finally summarize the results of the model in the sequence of formula Mio group level. Such as:

    val value: GroupPattern[Event, _] = Pattern.begin(Pattern.begin[Event]("start")
      .where(_.name.equals("name"))
      .followedBy("start_middle")
      .where(_.name.equals("yang")))

    val value1: Pattern[Event, _] = Pattern.begin(Pattern.begin[Event]("start")
      .next("next_start")
      .where(_.name.equals("name"))
      .followedBy("next_middle")
      .where(_.name.equals("yang"))).times(3)

AfterMatchSkipStrategy

In a given Pattern, when after the same event in line with a variety of modes combination of conditions, you need to specify AfterMatchSkipStrategy strategies to deal with the event already match. There are four event processing strategies AfterMatchSkipStrategy configurations, respectively NO_SKIP / SKIP_PAST_LAST_EVENT / SKIP_TO_FIRST / SKIP_TO_LAST. Each policy definition and used as follows: wherein SKIP_TO_FIRST and SKIP_TO_LAST requires that a valid PatternName in the definition process.

  • [] NO_SKIP: This policy represents all events possible matches for output, do not ignore any one.
AfterMatchSkipStrategy.noSkip()
  • [] SKIP_PAST_LAST_EVENT: This policy means to ignore the conditions to start the trigger mode from all parts of the current Pattern matching events trigger.
AfterMatchSkipStrategy.skipPastLastEvent()
  • [] SKIP_TO_FIRST: This policy represents a partial match before the event Pattern ignore the first match of the specified PatternName.
AfterMatchSkipStrategy.skipToFirst(patternName)
  • [] SKIP_TO_LAST The strategy represents a partial match between the last match before the neglect of the specified Pattern PatternName
AfterMatchSkipStrategy.skipToLast(patternName)
  • [] SKIP_TO_NEXT: This policy represents a partial match after the event ignores the specified PatternName of Pattern
AfterMatchSkipStrategy.skipToNext(patternName)

When finished selecting AfterMatchSkipStrategy after, you can re-create the Pattern, designated by skipStrategy begin method then can be applied to the current Pattern AfterMatchSkipStrategy in.

val skipStrategy = { }
Pattern.begin("pattern_name" , skipStrategy)

Event gets

For pattern or pattern sequence group defined previously, need to be combined and the input data stream, the event can be found in the relationship between a potential match. Such as:

val input : DataStream[Event] = ...
val pattern : Pattern[Event, _] = ...
var comparator : EventComparator[Event] = ... // optional

val patternStream: PatternStream[Event] = CEP.pattern(input, pattern, comparator)

FlinkCEP provides a method CEP.pattern Pattern DataStream and applications together to obtain PatternStream type data set, and the subsequent time for data acquisition is based PatternStream. In addition you can choose to create EventComparator, incoming Pattern sort of event, when arriving at the same time is equal to or Event Time Pattern, EventComparator Danny Chung ordering policy can help the sequence of events.

When the method is performed may CEP.pattern, PatternStream generated data set, the data set contains all the matching events. Currently it offers two ways to select and flatSelect event results in FlinkCEP extracted from PatternStream.

Drawn normal event by Select Function

By passing a custom Seclect Function complete the conversion and output matching events in Select method PatternStream in. Wherein the input parameters to Select Function Map [String, Iterable [IN] ], Map Key to the Name Pattern mode sequence, Value Pattern acceptable corresponding set of events, the format of the data type of the input event. Note that: the Select Funtion will output a result only after each call as follows:

  def selectFunction (pattern:Map[String,Iterable[IN]]):OUT = {
    // 获取 pattern 中的 startEvent
    val startEvent = pattern.get("start_pattern").get.next
    // 获取 pattern 中的 middleEvent
    val middleEvent = pattern.get("middle_pattern").get.next
    // 返回结果
    OUT(startEvent , middleEvent)
  }

Drawn out event by Select Function

  val patternStream: PatternStream[LoginEvent] = CEP.pattern(loginEventSource, pattern)
    // 创建 OutputTag ,并命名为 timeout-output
    val timeoutTag: OutputTag[String] = OutputTag[String]("timeout-output")
    // 调用 PatternStream Select() 并指定 timeoutTag
    patternStream.select(timeoutTag) {
      // 超时时间获取
      (pattern: Map[String, Iterable[LoginEvent]], timestamp: Long) => {
        TimeOutEvent()
      }

    }{
      (pattern: Map[String, Iterable[LoginEvent]], timestamp: Long) => {
        NormalEvent()
      }
        // 调用 getSideOutput 方法,并指定 timeoutTag 将超时事件输出
        val timeoutResult : DataStream[TimeOutEvent] = result.getSideOutput(timeoutTag)
    }

Drawn normal event by Flat Select Function

Flat Seclect Function and Select Function similar, but Flat Select Function In each call can return any number of results. Since Flat Select Function using the Collector as a result of the return of the vessel, the event will be required output are placed in the Collector returned. as follows:

  def faltSelectFunction(pattern:Map[String,Iterable[IN]],collector:Collector[OUT])={
    // 获取 pattern 中的 startEvent
    val startEvent = pattern.get("start_pattern").get.next
    // 获取 pattern 中的 middleEvent
    val middleEvent = pattern.get("middle_pattern").get.next
    // 根据 startEvent 返回结果
    for (i <- 0 to startEvent.value){
      collector.collect( OUT(startEvent , middleEvent))
    }
  }

Drawn out event by Flat Select Function

  val patternStream: PatternStream[LoginEvent] = CEP.pattern(loginEventSource, pattern)
    // 创建 OutputTag ,并命名为 timeout-output
    val timeoutTag: OutputTag[String] = OutputTag[String]("timeout-output")
    // 调用 PatternStream Select() 并指定 timeoutTag
    patternStream.select(timeoutTag) {
      // 超时时间获取
      (pattern: Map[String, Iterable[LoginEvent]], timestamp: Long , out:Collector[TimeoutEvent]) => {
        out.collect(TimeOutEvent())
      }

    }{
      (pattern: Map[String, Iterable[LoginEvent]], timestamp: Long) => {
        out.collect(NormalEvent())
      }
        // 调用 getSideOutput 方法,并指定 timeoutTag 将超时事件输出
        val timeoutResult : DataStream[TimeOutEvent] = result.getSideOutput(timeoutTag)
    }

Guess you like

Origin www.cnblogs.com/sun-iot/p/12102603.html