1. Create table statement
DROP TABLE IF EXISTS ads_order_continuously_user_count;
CREATE EXTERNAL TABLE ads_order_continuously_user_count
(
`dt` STRING COMMENT '统计日期',
`recent_days` BIGINT COMMENT '最近天数,7:最近7天',
`order_continuously_user_count` BIGINT COMMENT '连续3日下单用户数'
) COMMENT '最近7日内连续3日下单用户数统计'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_order_continuously_user_count/';
2. Data loading
2.1 Thought Analysis
The records of orders placed for three consecutive days have this feature: they are arranged in ascending order of dates, so there must be a date that is two days different from the dates in the next two rows. Based on this, the number of users who place orders consecutively can be filtered.
2.2 Execution steps
We want to get the order date of the user within the last 7 days, the data is taken from dws_trade_user_order_1d, just filter the data of the last 7 days partition. Open the window, arrange according to the user partition, dt (order date) in ascending order, call the lead() function to obtain the dt value of the last two rows, and then use datadiff to calculate the difference between the number of days and dt, and record it as diff. As mentioned above, if It is a record of orders placed for three consecutive days, and there must be a piece of data with a diff of 2. Filter the data with diff=2 to get all eligible users. If the number of consecutive order days is 4 days or more, then the same user has more than one eligible data and needs to be deduplicated, so call count(distinct user_id) Complete statistics.
2.3 Diagram
slightly
2.4 Code implementation
insert overwrite table ads_order_continuously_user_count
select * from ads_order_continuously_user_count
union
select
'2022-06-08',
7,
count(distinct(user_id))
from
(
select
user_id,
datediff(lead(dt,2,'9999-12-31') over(partition by user_id order by dt),dt) diff
from dws_trade_user_order_1d
where dt>=date_add('2022-06-08',-6)
)t1
where diff=2;
2.5 Thinking Questions
Are there any other ideas?
- Idea 1 :
(1) Thinking analysis
(2) Execution steps
(3) Diagram
(4) Code implementation
select
count(distinct(user_id))
from
(
select
user_id
from
(
select
user_id,
date_sub(dt,rank() over(partition by user_id order by dt)) diff
from dws_trade_user_order_1d
where dt>=date_add('2022-06-08',-6)
)t1
group by user_id,diff
having count(*)>=3
)t2;
3. Data link