Multiple Apache Flink windows validations

Jesus Zuñiga :

I'm just getting started on stream processing using Apache Flink, the thing is that I'm receiving a stream of Json that look like this:

{

  token_id: “tok_afgtryuo”,

  ip_address: “128.123.45.1“,

  device_fingerprint: “abcghift”,

  card_hash: “hgtyuigash”,

  “bin_number”: “424242”,

  “last4”: “4242”,

  “name”: “Seu Jorge”

}

And was asked if i could fulfill the following business rules:

  • Decline if number of tokens > 5 for this IP in last 10 seconds

  • Decline if number of tokens > 15 for this IP in last minute

  • Decline if number of tokens > 60 for this IP in last hour

I made 2 classes, main class when I'm making an instance to call the Window function with different parameters to avoid duplicate code:

Main.java

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //This DataStream Would be  Converting the Json to a Token Object
        DataStream<Token> baseStream =
                env.addSource(new SocketTextStreamFunction("localhost",
                        9999,
                        "\n",
                        1))
                        .map(new MapTokens());


        // 1- First rule Decline if number of tokens > 5 for this IP in last 10 seconds
       DataStreamSink<String> response1 =  new RuleMaker().getStreamKeyCount(baseStream, "ip", Time.seconds(10),
               5, "seconds").print();

        //2 -Decline if number of tokens > 15 for this IP in last minute
        DataStreamSink<String> response2 = new RuleMaker().getStreamKeyCount(baseStream, "ip", Time.minutes(1),
                62, "minutes").print();

        //3- Decline if number of tokens > 60 for this IP in last hour
        DataStreamSink<String> response3  = new RuleMaker().getStreamKeyCount(baseStream, "ip", Time.hours(1),
                60, "Hours").print();

        env.execute("Job2");
    }

And another class where I'm doing all the logic for rules, I'm counting the times where an IP address appears and, if it is more than the allowed number in the time window I'm returning a message with some information:

Rulemaker.java

public class RuleMaker {


    public DataStream<String> getStreamKeyCount(DataStream<Token> stream, 
                                                String tokenProp,
                                                Time time, 
                                                Integer maxPetitions, 
                                                String ruleType){

        return
               stream
                .flatMap(new FlatMapFunction<Token, Tuple3<String, Integer, String>>() {
                    @Override
                    public void flatMap(Token token, Collector<Tuple3<String, Integer, String>> collector) throws Exception {

                         String tokenSelection = "";
                        switch (tokenProp)
                        {
                            case "ip":
                                tokenSelection = token.getIpAddress();
                                break;
                            case "device":
                                tokenSelection = token.getDeviceFingerprint();
                                break;
                            case "cardHash":
                                tokenSelection = token.getCardHash();
                                break;
                        }
                        collector.collect(new Tuple3<>(tokenSelection, 1, token.get_tokenId()));
                    }
                })
                .keyBy(0)
                .timeWindow(time)
                .process(new MyProcessWindowFunction(maxPetitions, ruleType));
    }

    //Class to process the elements from the window
    private class MyProcessWindowFunction extends ProcessWindowFunction<
            Tuple3<String, Integer, String>,
            String,
            Tuple,
            TimeWindow
            > {

        private Integer _maxPetitions;
        private String  _ruleType;


        public MyProcessWindowFunction(Integer maxPetitions, String ruleType) {
            this._maxPetitions = maxPetitions;
            this._ruleType = ruleType;
        }

        @Override
        public void process(Tuple tuple, Context context, Iterable<Tuple3<String, Integer, String>> iterable, Collector<String> out) throws Exception {

            Integer counter = 0;
            for (Tuple3<String, Integer, String> element : iterable) {
                counter += element.f1++;
                if(counter > _maxPetitions){
                    out.collect("El elemeto ha sido declinado: " + element.f2 + " Num elements: " + counter + " rule type: " +  _ruleType + " token: " + element.f0 );
                    counter = 0;
                }
            }
        }
    }
}

So far, i think this code is working but I'm a begginer on Apache Flink, and I'll appreciate a lot if you could tell me if it's something wrong about the way I'm trying to work with this and point me to the right direction.

Thanks a lot.

Arvid Heise :

General approach looks very good, although I would have thought that Table API would be powerful enough to help you (more concise) which supports Json out of the box.

If you want to stick to DataStream API, in getStreamKeyCount, the switch around tokenProp should be replaced by passing a key extractor to getStreamKeyCount to have only one place to add new rules.

public DataStream<String> getStreamKeyCount(DataStream<Token> stream, 
                                            KeySelector<Token, String> keyExtractor,
                                            Time time, 
                                            Integer maxPetitions, 
                                            String ruleType){

    return stream
         .map(token -> new Tuple3<>(keyExtractor.getKey(token), 1, token.get_tokenId()))
            .keyBy(0)
            .timeWindow(time)
            .process(new MyProcessWindowFunction(maxPetitions, ruleType));
}

Then the invocation becomes

DataStreamSink<String> response2 = ruleMaker.getStreamKeyCount(baseStream, 
    Token::getIpAddress, Time.minutes(1), 62, "minutes");

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=6036&siteId=1