[Real-time Data Warehouse] DWS Layer Visitor Topic Calculation (Continued), Commodity Topic Calculation

One DWS layer - Visitor topic calculation

1 Write to OLAP database

(1) Add ClickhouseUtil

a JdbcSink.sink () four parameter description

  • Parameter 1: Sql is passed in, the format is: insert into xxx values(?,?,?,?)
  • Parameter 2: It can be realized by lambda expression (jdbcPreparedStatement, t) -> t is a data object, which must be assembled into the parameters of the statement precompiler.
  • Parameter 3: Set some execution parameters, such as the number of retries and batch size.
  • Parameter 4: Set connection parameters, such as address, port, and driver name.

b Get the implementation of the JdbcSink function in ClickhouseUtil

package com.hzy.gmall.realtime.utils;
/**
 * 操作ClickHouse的工具类
 */
public class ClickhouseUtil {
    
    
    public static <T> SinkFunction<T> getJdbcSink(String sql){
    
    
        // "insert into visitor_stats_2022 values(?,?,?,?,?,?,?,?,?,?,?,?)"
        SinkFunction<T> sinkFunction = JdbcSink.<T>sink(
                sql,
                new JdbcStatementBuilder<T>() {
    
    
                    // 参数 T obj:就是流中的一条数据
                    @Override
                    public void accept(PreparedStatement ps, T obj) throws SQLException {
    
    
                        // 获取流中obj属性值,赋值给问号

                    }
                },
                // 构造者设计模式
                new JdbcExecutionOptions.Builder()
                        // 5条数据为一批,一批处理一次
                        // 4个并行度,每一个slot数量到5,才会保存到ClickHouse
                        .withBatchSize(5)
//                        // 2s后直接保存到ClickHouse
//                        .withBatchIntervalMs(2000)
//                        // 最大重试次数为3
//                        .withMaxRetries(3)
                        .build(),
                new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
                        .withDriverName("ru.yandex.clickhouse.ClickHouseDriver")
                        .withUrl(GmallConfig.CLICKHOUSE_URL)
                        .build()
        );
        return sinkFunction;
    }
}

c constructor design pattern

When setting an object property value, there are usually two ways:

第一种
Animal animal=new Animal("XX","XXkg","XX");
第二种
Animal animal=new Animal();
animal.setAge("XX");
animal.setweight("XXkg");
animal.setFood("XX");
  • The first way:: It is equivalent to passing parameters in the constructor, but when adding parameters in this way, if there are too many attributes, it is impossible to clearly know which attribute content is added to this object.
  • The second way: Although you can see what the value to be set is based on the set function name, this way of writing is slightly redundant.

Among the design patterns there is the constructor pattern (builder), which has multiple parameters in the constructor or static factory of the class. When designing such classes, the builder pattern is a good choice.

The constructor is to create an inner class in the outer class, and then assign values ​​to the properties in the inner class. After the assignment is completed, the returned type is the inner class itself, and the second assignment method above has no return value and cannot be chained. Assignment, while the constructor pattern can be chained assignment. After many steps of assignment operations, use the .build() method to create an outer class object through the inner class.

See Java's constructor pattern (builder) for details .

d is assigned to the question mark placeholder and creates the TransientSink annotation

This annotation marks fields that do not need to be saved.

Because the writing mechanism of the previous ClickhouseUtil tool class is to write all the fields of the entity class into the data table in order. However, entity classes sometimes use some temporary fields, which are useful in calculations but do not need to be stored in temporary tables. We can make some identifications for these fields, and then judge the identifications to filter out these fields when writing.

The usual way to mark a field is to add an annotation to the field. Here, a custom annotation @TransientSink is added to identify that the field does not need to be saved in the data table.

new JdbcStatementBuilder<T>() {
    
    
    // 参数 T obj:就是流中的一条数据
    // 获取流中对象obj的属性值,赋值给问号占位符
    @Override
    public void accept(PreparedStatement ps, T obj) throws SQLException {
    
    
        // 获取流中对象所属类的属性
        Field[] fields = obj.getClass().getDeclaredFields();
        int skipNum = 0;
        // 对属性数组进行遍历
        for (int i = 0; i < fields.length; i++) {
    
    
            // 获取每一个属性对象
            Field field = fields[i];
            // 判断该属性是否有@trannsient注解修饰
            TransientSink transientSink = field.getAnnotation(TransientSink.class);
            if (transientSink != null){
    
    
                skipNum++;
                continue;
            }
            // 设置私有属性的访问权限
            field.setAccessible(true);
            try {
    
    
                // 获取对象的属性值
                Object fieldValue = field.get(obj);
                // 将属性的值赋值给问号占位符
                // JDBC相关操作和查询结果集的列从1开始
                ps.setObject(i+1-skipNum,fieldValue);
            } catch (IllegalAccessException e) {
    
    
                e.printStackTrace();
            }
        }
    }
}

e Configure the connection address of ClickHouse in GmallConfig

public static final String CLICKHOUSE_URL = "jdbc:clickhouse://hadoop101:8123/default";

f Add a Sink written to ClickHouse for the main program

// TODO 9 将聚合统计之后的数据写到ClickHouse
reduceDS.addSink(
        ClickhouseUtil.getJdbcSink("insert into visitor_stats_2022 values(?,?,?,?,?,?,?,?,?,?,?,?)")
);

(2) Overall test

  • Start ZK, Kafka, logger.sh, ClickHouse, [HDFS]
  • runBaseLogApp
  • Run UniqueVisitApp
  • runUserJumpDetailApp
  • Run the VisitorStatsApp
  • Run the jar package in the rt_applog directory
  • View console output
  • View the visitor_stats_2022 table data in ClickHouse

The data in ClickHouse is as follows:

insert image description here

2nd DWS layer - Commodity Topic Calculation

statistics topic demand indicator output method calculation source source hierarchy
merchandise click multidimensional analysis page_log dwd
exposure multidimensional analysis page_log dwd
collect multidimensional analysis favor_info dwd
add to the cart multidimensional analysis cart_info dwd
place an order Visual large screen order_wide dwm
to pay multidimensional analysis payment_wide dwm
Refund multidimensional analysis order_refund_info dwd
evaluate multidimensional analysis comment_info dwd

Similar to the wide table of the visitor's dws layer, it also aggregates the detailed data of multiple fact tables and combines them into a wide table

1 Demand Analysis and Ideas

  • Get streams from Kafka topics
  • Convert Json string data stream to data stream of unified data object
  • Merge streams of unified data structures into one stream
  • Set event time and watermark
  • grouping, windowing, aggregation
  • Write to ClickHouse

The overall process is as follows:

insert image description here

2 Package commodity statistics entity class ProductStats

Entity classes are defined using the constructor pattern.

package com.hzy.gmall.realtime.beans;

/**
 * Desc: 商品统计实体类
 * @Builder注解
 *      可以使用构造者方式创建对象,给属性赋值
 * @Builder.Default
 *      在使用构造者方式给属性赋值的时候,属性的初始值会丢失
 *      该注解的作用就是修复这个问题
 *      例如在属性上赋值了初始值为0L,如果不加这个注解,通过构造者创建的对象属性值会变为null
 */
@Data
@Builder
public class ProductStats {
    
    

    String stt;//窗口起始时间
    String edt;  //窗口结束时间
    Long sku_id; //sku编号
    String sku_name;//sku名称
    BigDecimal sku_price; //sku单价
    Long spu_id; //spu编号
    String spu_name;//spu名称
    Long tm_id; //品牌编号
    String tm_name;//品牌名称
    Long category3_id;//品类编号
    String category3_name;//品类名称

    @Builder.Default
    Long display_ct = 0L; //曝光数

    @Builder.Default
    Long click_ct = 0L;  //点击数

    @Builder.Default
    Long favor_ct = 0L; //收藏数

    @Builder.Default
    Long cart_ct = 0L;  //添加购物车数

    @Builder.Default
    Long order_sku_num = 0L; //下单商品个数

    @Builder.Default   //下单商品金额
            BigDecimal order_amount = BigDecimal.ZERO;

    @Builder.Default
    Long order_ct = 0L; //订单数

    @Builder.Default   //支付金额
            BigDecimal payment_amount = BigDecimal.ZERO;

    @Builder.Default
    Long paid_order_ct = 0L;  //支付订单数

    @Builder.Default
    Long refund_order_ct = 0L; //退款订单数

    @Builder.Default
    BigDecimal refund_amount = BigDecimal.ZERO;

    @Builder.Default
    Long comment_ct = 0L;//评论数

    @Builder.Default
    Long good_comment_ct = 0L; //好评数

    @Builder.Default
    @TransientSink
    Set orderIdSet = new HashSet();  //用于统计订单数

    @Builder.Default
    @TransientSink
    Set paidOrderIdSet = new HashSet(); //用于统计支付订单数

    @Builder.Default
    @TransientSink
    Set refundOrderIdSet = new HashSet();//用于退款支付订单数

    Long ts; //统计时间戳

}

3 Create ProductStatsApp to get data stream from Kafka topic

package com.hzy.gmall.realtime.app.dws;
/**
 * 商品主题统计DWS
 */
public class ProductStatsApp {
    
    
    public static void main(String[] args) throws Exception {
    
    
        // TODO 1 基本环境准备
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(4);

        // TODO 2 检查点相关配置(略)

        // TODO 3 从kafka主题中读取数据
        // 3.1 声明消费主题及消费者组
        String groupId = "product_stats_app";
        String pageViewSourceTopic = "dwd_page_log";
        String orderWideSourceTopic = "dwm_order_wide";
        String paymentWideSourceTopic = "dwm_payment_wide";
        String cartInfoSourceTopic = "dwd_cart_info";
        String favorInfoSourceTopic = "dwd_favor_info";
        String refundInfoSourceTopic = "dwd_order_refund_info";
        String commentInfoSourceTopic = "dwd_comment_info";
        // 3.2 创建消费者对象
        FlinkKafkaConsumer<String> pageViewSource  = MyKafkaUtil.getKafkaSource(pageViewSourceTopic,groupId);
        FlinkKafkaConsumer<String> orderWideSource  = MyKafkaUtil.getKafkaSource(orderWideSourceTopic,groupId);
        FlinkKafkaConsumer<String> paymentWideSource  = MyKafkaUtil.getKafkaSource(paymentWideSourceTopic,groupId);
        FlinkKafkaConsumer<String> favorInfoSourceSouce  = MyKafkaUtil.getKafkaSource(favorInfoSourceTopic,groupId);
        FlinkKafkaConsumer<String> cartInfoSource  = MyKafkaUtil.getKafkaSource(cartInfoSourceTopic,groupId);
        FlinkKafkaConsumer<String> refundInfoSource  = MyKafkaUtil.getKafkaSource(refundInfoSourceTopic,groupId);
        FlinkKafkaConsumer<String> commentInfoSource  = MyKafkaUtil.getKafkaSource(commentInfoSourceTopic,groupId);
        // 3.3 读取数据,封装为流
        DataStreamSource<String> pageViewStrDS = env.addSource(pageViewSource);
        DataStreamSource<String> favorInfoStrDS = env.addSource(favorInfoSourceSouce);
        DataStreamSource<String> orderWideStrDS= env.addSource(orderWideSource);
        DataStreamSource<String> paymentWideStrDS= env.addSource(paymentWideSource);
        DataStreamSource<String> cartInfoStrDS= env.addSource(cartInfoSource);
        DataStreamSource<String> refundInfoStrDS= env.addSource(refundInfoSource);
        DataStreamSource<String> commentInfoStrDS= env.addSource(commentInfoSource);

        env.execute();
    }
}

4 Convert the JSON string data stream into a data stream of unified data objects

Clickstream String Format

{
    
    
    "page": {
    
    
        "page_id": "good_detail",
        "item":"9",
        "during_time":15839,
        "item_type":"sku_id",
        "last_page_id":"home",
        "source_type":"query"
    },

Exposure Stream String Format

{
    
    
	"displays": [
        {
    
    
            "display_type": "activity",
            "item": "1",
            "item_type": "activity_id",
            "pos_id": 5,
            "order": 1
        }
    ],

(1) Convert click and exposure flow data

The click stream and exposure stream are processed separately.

// TODO 4 对流中的数据进行类型转换 jsonStr -> ProductStats
// 4.1  转换点击以及曝光流数据
SingleOutputStreamOperator<ProductStats> clickAndDisplayStatsDS = pageViewStrDS.process(
        new ProcessFunction<String, ProductStats>() {
    
    
            @Override
            public void processElement(String jsonStr, Context ctx, Collector<ProductStats> out) throws Exception {
    
    
                JSONObject jsonObj = JSON.parseObject(jsonStr);
                Long ts = jsonObj.getLong("ts");
                // 判断是否为点击行为
                JSONObject pageJsonObj = jsonObj.getJSONObject("page");
                String pageId = pageJsonObj.getString("page_id");
                if (pageId.equals("good_detail")) {
    
    
                    // 如果当前日志记录的页面是商品的详情页,认为这条日志记录的是点击行为
                    Long skuId = pageJsonObj.getLong("item");
                    // 下面代码等同于 new 外部类.内部类().build();
                    ProductStats productStats = ProductStats.builder()
                            .sku_id(skuId)
                            .click_ct(1L)
                            .ts(ts)
                            .build();
                    out.collect(productStats);
                }

                // 判断是否为曝光行为
                JSONArray displays = jsonObj.getJSONArray("displays");
                if (displays != null && displays.size() > 0) {
    
    
                    // 如果displays数组不为空,说明页面上有曝光行为,对所有曝光行为进行遍历
                    for (int i = 0; i < displays.size(); i++) {
    
    
                        JSONObject displaysJsonObj = displays.getJSONObject(i);
                        // 判断曝光的是否为商品
                        if (displaysJsonObj.getString("item_type").equals("sku_id")) {
    
    
                            Long skuId = displaysJsonObj.getLong("item");
                            ProductStats productStats1 = ProductStats.builder()
                                    .sku_id(skuId)
                                    .display_ct(1L)
                                    .ts(ts)
                                    .build();
                            out.collect(productStats1);
                        }
                    }
                }
            }
        }
);

(2) Convert collection flow data

// 4.2 转化收藏流数据
SingleOutputStreamOperator<ProductStats> favorStatsDS = favorInfoStrDS.map(
        new MapFunction<String, ProductStats>() {
    
    
            @Override
            public ProductStats map(String jsonStr) throws Exception {
    
    
                JSONObject jsonObj = JSON.parseObject(jsonStr);
                ProductStats productStats = ProductStats.builder()
                        .sku_id(jsonObj.getLong("sku_id"))
                        .favor_ct(1L)
                        .ts(DateTimeUtil.toTs(jsonObj.getString("create_time")))
                        .build();
                return productStats;
            }
        }
);

(3) Convert plus purchase flow data

// 4.3 转换加购流数据
SingleOutputStreamOperator<ProductStats> cartStatsDS = cartInfoStrDS.map(
        new MapFunction<String, ProductStats>() {
    
    
            @Override
            public ProductStats map(String jsonStr) throws Exception {
    
    
                JSONObject jsonObj = JSON.parseObject(jsonStr);
                ProductStats productStats = ProductStats.builder()
                        .sku_id(jsonObj.getLong("sku_id"))
                        .cart_ct(1L)
                        .ts(DateTimeUtil.toTs(jsonObj.getString("create_time")))
                        .build();
                return productStats;
            }
        }
);

(4) Convert refund flow data

There are multiple items in an order that need to be refunded, and need to be deduplicated. HashSet is used, and the entity class is defined as follows.

@Builder.Default
@TransientSink
Set refundOrderIdSet = new HashSet();//用于退款支付订单数
// 4.4 转换退款流数据
SingleOutputStreamOperator<ProductStats> refundStatsDS = refundInfoStrDS.map(
        new MapFunction<String, ProductStats>() {
    
    
            @Override
            public ProductStats map(String jsonStr) throws Exception {
    
    
                JSONObject jsonObj = JSON.parseObject(jsonStr);

                ProductStats productStats = ProductStats.builder()
                        .sku_id(jsonObj.getLong("sku_id"))
                        .refundOrderIdSet(
                                new HashSet(Collections.singleton(jsonObj.getLong("order_id")))
                        )
                        .refund_amount(jsonObj.getBigDecimal("refund_amout"))
                        .ts(DateTimeUtil.toTs(jsonObj.getString("create_time")))
                        .build();
                return productStats;
            }
        }
);

(5) Create an e-commerce business constant class GmallConstant

Prepare to convert review stream data

package com.hzy.gmall.realtime.beans;

/**
 * Desc: 电商业务常量
 */
public class GmallConstant {
    
    
    //10 单据状态
    public static final String ORDER_STATUS_UNPAID="1001";  //未支付
    public static final String ORDER_STATUS_PAID="1002"; //已支付
    public static final String ORDER_STATUS_CANCEL="1003";//已取消
    public static final String ORDER_STATUS_FINISH="1004";//已完成
    public static final String ORDER_STATUS_REFUND="1005";//退款中
    public static final String ORDER_STATUS_REFUND_DONE="1006";//退款完成


    //11 支付状态
    public static final String PAYMENT_TYPE_ALIPAY="1101";//支付宝
    public static final String PAYMENT_TYPE_WECHAT="1102";//微信
    public static final String PAYMENT_TYPE_UNION="1103";//银联

    //12 评价
    public static final String APPRAISE_GOOD="1201";// 好评
    public static final String APPRAISE_SOSO="1202";// 中评
    public static final String APPRAISE_BAD="1203";//  差评
    public static final String APPRAISE_AUTO="1204";// 自动



    //13 退货原因
    public static final String REFUND_REASON_BAD_GOODS="1301";// 质量问题
    public static final String REFUND_REASON_WRONG_DESC="1302";// 商品描述与实际描述不一致
    public static final String REFUND_REASON_SALE_OUT="1303";//   缺货
    public static final String REFUND_REASON_SIZE_ISSUE="1304";//  号码不合适
    public static final String REFUND_REASON_MISTAKE="1305";//  拍错
    public static final String REFUND_REASON_NO_REASON="1306";//  不想买了
    public static final String REFUND_REASON_OTHER="1307";//    其他

    //14 购物券状态
    public static final String COUPON_STATUS_UNUSED="1401";//    未使用
    public static final String COUPON_STATUS_USING="1402";//     使用中
    public static final String COUPON_STATUS_USED="1403";//       已使用

    //15退款类型
    public static final String REFUND_TYPE_ONLY_MONEY="1501";//   仅退款
    public static final String REFUND_TYPE_WITH_GOODS="1502";//    退货退款

    //24来源类型
    public static final String SOURCE_TYPE_QUREY="2401";//   用户查询
    public static final String SOURCE_TYPE_PROMOTION="2402";//   商品推广
    public static final String SOURCE_TYPE_AUTO_RECOMMEND="2403";//   智能推荐
    public static final String SOURCE_TYPE_ACTIVITY="2404";//   促销活动


    //购物券范围
    public static final String COUPON_RANGE_TYPE_CATEGORY3="3301";//
    public static final String COUPON_RANGE_TYPE_TRADEMARK="3302";//
    public static final String COUPON_RANGE_TYPE_SPU="3303";//

    //购物券类型
    public static final String COUPON_TYPE_MJ="3201";//满减
    public static final String COUPON_TYPE_DZ="3202";// 满量打折
    public static final String COUPON_TYPE_DJ="3203";//  代金券

    public static final String ACTIVITY_RULE_TYPE_MJ="3101";
    public static final String ACTIVITY_RULE_TYPE_DZ ="3102";
    public static final String ACTIVITY_RULE_TYPE_ZK="3103";


    public static final String KEYWORD_SEARCH="SEARCH";
    public static final String KEYWORD_CLICK="CLICK";
    public static final String KEYWORD_CART="CART";
    public static final String KEYWORD_ORDER="ORDER";
}

(6) Convert evaluation stream data

It is necessary to count the total number of comments and the number of favorable comments.

// 4.5 转换评价流数据
SingleOutputStreamOperator<ProductStats> commentStatsDS = commentInfoStrDS.map(
        new MapFunction<String, ProductStats>() {
    
    
            @Override
            public ProductStats map(String jsonStr) throws Exception {
    
    
                JSONObject jsonObj = JSON.parseObject(jsonStr);
                long goodct = GmallConstant.APPRAISE_GOOD.equals(jsonObj.getString("appraise")) ? 1L : 0L;
                ProductStats productStats = ProductStats.builder()
                        .sku_id(jsonObj.getLong("sku_id"))
                        .ts(DateTimeUtil.toTs(jsonObj.getString("create_time")))
                        .comment_ct(1L)
                        .good_comment_ct(goodct)
                        .build();
                return productStats;
            }
        }
);

Guess you like

Origin blog.csdn.net/weixin_43923463/article/details/128322418