[hive classic indicators, offline data warehouse indicators, ADS layer indicator analysis] the number of users who placed orders for 3 consecutive days in the last 7 days

1. Create table statement

DROP TABLE IF EXISTS ads_order_continuously_user_count;
CREATE EXTERNAL TABLE ads_order_continuously_user_count
(
    `dt`                            STRING COMMENT '统计日期',
    `recent_days`                   BIGINT COMMENT '最近天数,7:最近7天',
    `order_continuously_user_count` BIGINT COMMENT '连续3日下单用户数'
) COMMENT '最近7日内连续3日下单用户数统计'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_order_continuously_user_count/';

2. Data loading

2.1 Thought Analysis

The records of orders placed for three consecutive days have this feature: they are arranged in ascending order of dates, so there must be a date that is two days different from the dates in the next two rows. Based on this, the number of users who place orders consecutively can be filtered.

2.2 Execution steps

 We want to get the order date of the user within the last 7 days, the data is taken from dws_trade_user_order_1d, just filter the data of the last 7 days partition. Open the window, arrange according to the user partition, dt (order date) in ascending order, call the lead() function to obtain the dt value of the last two rows, and then use datadiff to calculate the difference between the number of days and dt, and record it as diff. As mentioned above, if It is a record of orders placed for three consecutive days, and there must be a piece of data with a diff of 2. Filter the data with diff=2 to get all eligible users. If the number of consecutive order days is 4 days or more, then the same user has more than one eligible data and needs to be deduplicated, so call count(distinct user_id) Complete statistics.

 2.3 Diagram

 slightly

2.4 Code implementation

insert overwrite table ads_order_continuously_user_count
select * from ads_order_continuously_user_count
union
select
    '2022-06-08',
    7,
    count(distinct(user_id))
from
(
    select
        user_id,
        datediff(lead(dt,2,'9999-12-31') over(partition by user_id order by dt),dt) diff
    from dws_trade_user_order_1d
    where dt>=date_add('2022-06-08',-6)
)t1
where diff=2;

2.5  Thinking Questions

 Are there any other ideas?

  •  Idea 1 :

(1) Thinking analysis

(2) Execution steps

(3) Diagram

(4) Code implementation

select
    count(distinct(user_id))
from
(
    select
        user_id
    from
    (
        select
            user_id,
            date_sub(dt,rank() over(partition by user_id order by dt)) diff
        from dws_trade_user_order_1d
        where dt>=date_add('2022-06-08',-6)
    )t1
    group by user_id,diff
    having count(*)>=3
)t2;

3. Data link

 

Guess you like

Origin blog.csdn.net/qq_40382400/article/details/132091181