Based on collaborative filtering flink

Recently flink over the fire, try using the recommended function to try to do flink, went ahead, saying flink-ml water is actually relatively fewer algorithm contained, and only supports scala version, as well as flink1.9 been removed flink-ml , it appears to be ready to make major moves, but later the real-time recommendations, flink can come in handy. Fortunately item collaborative filtering algorithm based on relatively simple to achieve them is not difficult. Look at the currently recommended overall architecture.

Based on collaborative filtering flink

Let me talk about a similar algorithm used:
the X-= (x1, X2, x3, ... Xn), the Y-= (y1, y2, y3, ... Yn)
then the Euclidean distance is:

Based on collaborative filtering flink

Obviously, the larger the value, the worse the similarity, if the two are identical, then the distance is zero.

The first step in preparing the data, format data is as follows:

Based on collaborative filtering flink

actionObject is the house number, actionType is the user's behavior, including exposure did not click, click, collection and so on.

The following code is to obtain data from hdfs, the data and clear view of events, or other acts into scores


public static DataSet<Tuple2<Tuple2<String, String>, Float>> getData(ExecutionEnvironment env, String path) {
        DataSet<Tuple2<Tuple2<String, String>, Float>> res= env.readTextFile(path).map(new MapFunction<String, Tuple2<Tuple2<String, String>, Float>> (){

            @Override
            public Tuple2<Tuple2<String, String>, Float> map(String value) throws Exception {
                    JSONObject jj=JSON.parseObject(value);
                    if(RecommendUtils.getValidAction(jj.getString("actionType"))) {                     
                        return new Tuple2<>(new Tuple2<>(jj.getString("userId"),jj.getString("actionObject")),RecommendUtils.getScore(jj.getString("actionType")));                 
                    }else {
                        return null;
                    }

                }   
            }).filter(new FilterFunction<Tuple2<Tuple2<String, String>, Float>>(){
                @Override
                public boolean filter(Tuple2<Tuple2<String, String>, Float> value) throws Exception {           
                    return value!=null;
                }       
            });

           return res;
    }

After simple washing data into the following format

Based on collaborative filtering flink

The first two columns in accordance with the polymerization,

groupBy(0).reduce(new ReduceFunction<Tuple2<Tuple2<String, String>, Float>>() { 

                @Override
                public Tuple2<Tuple2<String, String>, Float> reduce(Tuple2<Tuple2<String, String>, Float> value1,
                        Tuple2<Tuple2<String, String>, Float> value2) throws Exception {
                    // TODO Auto-generated method stub
                    return new Tuple2<>(new Tuple2<>(value1.f0.f0, value1.f0.f1),(value1.f1+value2.f1)); 
                }

            })

Structure becomes

Based on collaborative filtering flink

At this time, in theory BJCY56167779_03, BJCY56167779_04 similarity to (4-3) ^ 2 + (5-2) ^ 2, and then prescribing and move.

Removing the first column, the format

Based on collaborative filtering flink

Since:
(X1-Y1) ^ 2 + (X2-Y2) = 2 ^ 2 ^ X1 + Y1 + X2 2-2x1y1 ^ ^ 2 ^ 2-2x2y2 + Y2 + Y1 = X1 ^ 2 ^ 2 ^ 2 + X2 + y2 ^ 2-2 (x1y1 + x2y2) , so let's find x1 ^ 2 + x2 ^ 2 value of, and registered as item table


.map(new MapFunction<Tuple2<String, Float>, Tuple2<String, Float>>() {
                @Override
                public Tuple2<String, Float> map(Tuple2<String, Float> value) throws Exception {
                    return new Tuple2<>(value.f0, value.f1*value.f1);
                }  
            }).
groupBy(0).reduce(new ReduceFunction<Tuple2<String, Float>>(){

                @Override
                public Tuple2<String, Float> reduce(Tuple2<String, Float> value1, Tuple2<String, Float> value2)
                        throws Exception {
                     Tuple2<String, Float> temp= new Tuple2<>(value1.f0, value1.f1 +  value2.f1);
                     return temp;
                }

}).map(new MapFunction<Tuple2<String, Float>, ItemDTO> (){

                @Override
                public ItemDTO map(Tuple2<String, Float> value) throws Exception {
                    ItemDTO nd=new ItemDTO();
                    nd.setItemId(value.f0);
                    nd.setScore(value.f1);
                    return nd;
                }

}); 

tableEnv.registerDataSet("item", itemdto); // 注册表信息

After conversion of the above, the front half portion of the value already determined, the following requirements (x1y1 + x2y2) value

The original table top turn it again into the following format

Based on collaborative filtering flink

code show as below:

.map(new MapFunction<Tuple2<String,List<Tuple2<String,Float>>>, List<Tuple2<Tuple2<String, String>, Float>>>() {

                @Override
                public List<Tuple2<Tuple2<String, String>, Float>> map(Tuple2<String,List<Tuple2<String,Float>>> value) throws Exception {
                    List<Tuple2<String, Float>> ll= value.f1;                   
                    List<Tuple2<Tuple2<String, String>, Float>> list = new ArrayList<>();
                    for (int i = 0; i < ll.size(); i++) {
                        for (int j = 0; j < ll.size(); j++) {
                            list.add(new Tuple2<>(new Tuple2<>(ll.get(i).f0, ll.get(j).f0),
                                    ll.get(i).f1 * ll.get(j).f1));
                        }
                    }
                    return list;        
                }

            })

tableEnv.registerDataSet("item_relation", itemRelation); // 注册表信息

The following is the entire formula put together to complete the final calculation.

Table similarity=tableEnv.sqlQuery("select ta.firstItem,ta.secondItem,"
        + "(sqrt(tb.score + tc.score - 2 * ta.relationScore)) as similarScore from item tb " +
        "inner join item_relation ta  on tb.itemId = ta.firstItem and ta.firstItem <> ta.secondItem "+
        "inner join item tc on tc.itemId = ta.secondItem "          
        );

DataSet<ItemSimilarDTO> ds=tableEnv.toDataSet(similarity, ItemSimilarDTO.class);

Now the structure becomes

Based on collaborative filtering flink

Feel far away from the end, the above structure is still not what we want, we want to structure more clearly, the following format

Based on collaborative filtering flink

code show as below:

DataSet<RedisDataModel> redisResult= ds.map(new MapFunction<ItemSimilarDTO, Tuple2<String, Tuple2<String, Float>>> (){

            @Override
            public Tuple2<String, Tuple2<String, Float>> map(ItemSimilarDTO value) throws Exception {               
                return new Tuple2<String, Tuple2<String, Float>>(value.getFirstItem(), new Tuple2<>(value.getSecondItem(), value.getSimilarScore().floatValue()));
            }
        }).groupBy(0).reduceGroup(new GroupReduceFunction<Tuple2<String, Tuple2<String, Float>> , Tuple2<String, List<RoomModel>>>() { 

            @Override
            public void reduce(Iterable<Tuple2<String, Tuple2<String, Float>>> values,
                    Collector<Tuple2<String, List<RoomModel>>> out) throws Exception {

                List<RoomModel> list=new ArrayList<>();
                String key=null;
                for (Tuple2<String, Tuple2<String, Float>> t : values) {
                    key=t.f0;
                    RoomModel rm=new RoomModel();
                    rm.setRoomCode(t.f1.f0);
                    rm.setScore(t.f1.f1);
                    list.add(rm);
                }       

                //升序排序
                Collections.sort(list,new Comparator<RoomModel>(){
                    @Override
                    public int compare(RoomModel o1, RoomModel o2) {                                            
                        return o1.getScore().compareTo(o2.getScore());                      
                    }
                 });

                out.collect(new Tuple2<>(key,list));            
            }

        }).map(new MapFunction<Tuple2<String, List<RoomModel>>, RedisDataModel>(){

            @Override
            public RedisDataModel map(Tuple2<String, List<RoomModel>> value) throws Exception {
                RedisDataModel m=new RedisDataModel();
                m.setExpire(-1); 
                m.setKey(JobConstants.REDIS_FLINK_ITEMCF_KEY_PREFIX+value.f0);      
                m.setGlobal(true);
                m.setValue(JSON.toJSONString(value.f1));
                return m;
            }

        });

These data will eventually be stored in redis, convenient query

RedisOutputFormat redisOutput = RedisOutputFormat.buildRedisOutputFormat()
                    .setHostMaster(AppConfig.getProperty(JobConstants.REDIS_HOST_MASTER))
                    .setHostSentinel(AppConfig.getProperty(JobConstants.REDIS_HOST_SENTINELS))
                    .setMaxIdle(Integer.parseInt(AppConfig.getProperty(JobConstants.REDIS_MAXIDLE)))
                    .setMaxTotal(Integer.parseInt(AppConfig.getProperty(JobConstants.REDIS_MAXTOTAL))) 
                    .setMaxWaitMillis(Integer.parseInt(AppConfig.getProperty(JobConstants.REDIS_MAXWAITMILLIS)))
                    .setTestOnBorrow(Boolean.parseBoolean(AppConfig.getProperty(JobConstants.REDIS_TESTONBORROW)))
                    .finish();   
            redisResult.output(redisOutput);

            env.execute("itemcf");

Done, in fact, not so difficult as imagined. Of course, here is a Demo, but also actual data filtering, multi-table join optimization.

Guess you like

Origin blog.51cto.com/12597095/2433875