User Portrait Series - Application of HBase in Portrait Tag Expiration Policy

1. Background

The previous series of articles introduced the concept of user portraits, the label processing of user portraits, and the application of user portraits. This article mainly introduces some technical details of portraits, so that everyone can understand the logic of portrait data storage and processing in more detail

To give a real example:

Example 1: Because of the epidemic, a platform (related to the epidemic) was launched, and then users followed the epidemic platform. There is a label to indicate whether the user is concerned about the epidemic. However, with the liberalization of the policy, this label is obviously not for the company. After the epidemic situation is released, and storage costs are wasted, a separate field is required for identification

Example 2: An account will log in to many devices when shopping or watching videos, such as the web version, mobile version (Android, ios) or pad version of the e-commerce platform, etc., but it may follow the user to change devices (change mobile phones, computer or pad), it is meaningless to store the previous device information, after all, the mobile phone or computer may no longer be used, at least not linked to this account system

Example 3: A user is a non-member, and there is such a label to predict the probability of becoming a member, but after becoming a member one day, this data is obviously worthless and should be cleared

Through the above example, we can see that there are some tags that, over time, have no value for the business and waste storage space, and even cause misunderstandings because they have not expired.

2. Solutions

Is there such a plan? What about expiring such tags?

For example: If you think that a device under a certain account will not be updated for half a year or a certain label will not be updated for half a year, then delete this label?

 The above flow chart introduces the tag writing process and the tag expiration process.

Tag expiration: It is necessary to read all the portrait data and judge each tag. If the current time - the update time of the tag > the TTL time of the tag, the tag needs to be deleted.

That is: the whole process supports TTL at the database column level, and it is required to obtain the time when the label is updated, that is, the update time of the column

3. Realization

At present, Hbase + Mysql is used as a whole to implement. Hbase supports setting the update time when updating columns, and at the same time supports obtaining the column update time from reading the column. The overall process can be completed by configuring the TTL information of the label through Mysql.

hbase写入时设置列的时间
@Test
    public void insert() throws IOException {
        Calendar calendar = Calendar.getInstance();
        calendar.set(calendar.get(Calendar.YEAR), calendar.get(Calendar.MONTH), calendar.get(Calendar.DAY_OF_MONTH) - 1, 0, 0, 0);
        long preZero = calendar.getTime().getTime();
        System.out.println(preZero);

        Connection connection = createConnection();
        Table table = connection.getTable(TableName.valueOf("tmp_test_info"));
        ArrayList<Put> puts = new ArrayList<>();
        Put put4 = new Put("0005".getBytes());
        put4.addColumn("f1".getBytes(), "name".getBytes(), preZero, Bytes.toBytes("小杰"));
        put4.addColumn("f1".getBytes(), "age".getBytes(), Bytes.toBytes(24));
        //不设置则用当前时间
        puts.add(put4);
        table.put(puts);
        table.close();
        connection.close();
 }
@Test
public void scan() throws IOException {
        Map<String, Long> cellTTL = new HashMap<>();
        cellTTL.put("name", 1L);//单位天
        cellTTL.put("age", 2L);//单位天
        List<Delete> deleteList = new ArrayList<>();
        long currentTime = System.currentTimeMillis();
        Table table = createConnection().getTable(TableName.valueOf("tmp_test_info"));
        Scan scan = new Scan();
        scan.withStartRow("0001".getBytes());
        scan.withStopRow("0008".getBytes());
        ResultScanner scanner = table.getScanner(scan);
        for (Result result : scanner) {
            List<Cell> cells = result.listCells();
            for (Cell cell : cells) {
                String rk = Bytes.toString(CellUtil.cloneRow(cell));
                String family = Bytes.toString(CellUtil.cloneFamily(cell));
                String column = Bytes.toString(CellUtil.cloneQualifier(cell));
                long timestamp = cell.getTimestamp();
                if (column.equals("name")) {
                    String value = Bytes.toString(CellUtil.cloneValue(cell));
                    System.out.println(rk + ":" + family + ":" + column + ":" + value + ":" + timestamp);
                } else {
                    int value = Bytes.toInt(CellUtil.cloneValue(cell));

                    System.out.println(rk + ":" + family + ":" + column + ":" + value + ":" + timestamp);
                }
                if (cellTTL.containsKey(column)) {
                    if (currentTime - timestamp > cellTTL.get(column)*24*60*60*1000) {
                        //判断列标签是否过期
                        Delete delete = new Delete(Bytes.toBytes(rk));
                        delete.addColumn(Bytes.toBytes(family), Bytes.toBytes(column));
                        deleteList.add(delete);
                    }
                }
            }
        }
        if (!deleteList.isEmpty() && deleteList.size() > 0) {
            table.delete(deleteList);
        }
        table.close();
}

A simple version of the expiration policy processing code is given above

Guess you like

Origin blog.csdn.net/weixin_43291055/article/details/130382185