SQL comprehensive case e-commerce funnel conversion analysis, pv, uv and

 Funnel model example:

Different business scenarios have different business paths: there is a sequence, and events can occur multiple times.

Registration conversion funnel: Start APP --> APP registration page ---> Registration result --> Submit order --> Payment successful

Search and purchase conversion funnel: Search for products-->Click on products--->Add to shopping cart-->Submit order-->Payment successful

Flash sale activity purchase conversion funnel: Click on the flash sale activity-->Participate in the event--->Participate in the flash sale-->Second sale is successful--->Successful payment

 E-commerce purchase conversion funnel model diagram:

 

Processing steps:

Make the funnel name clear: Purchase Conversion Funnel

Starting event: Browsed the product details page

Target event: payment

Business process event link: details page->shopping cart->order page->payment

[Is there any time interval requirement between events? Can two adjacent events in the link have other events?]

 

需求:求购买转化漏斗模型的转换率(事件和事件之间没有时间间隔要求,并且相邻两个事件可以去干其他的事)
1.每一个步骤的uv
2.相对的转换率(下一个步骤的uv/上一个步骤的UV),绝对的转换率(当前步骤的UV第一步骤的UV)

关心的事件:e1,e2,e4,e5  ==> 先后顺序不能乱

-- 准备数据
user_id  event_id   event_action  event_time
u001,e1,view_detail_page,2022-11-01 01:10:21
u001,e2,add_bag_page,2022-11-01 01:11:13
u001,e3,collect_goods_page,2022-11-01 02:07:11
u002,e3,collect_goods_page,2022-11-01 01:10:21
u002,e4,order_detail_page,2022-11-01 01:11:13
u002,e5,pay_detail_page,2022-11-01 02:07:11
u002,e6,click_adver_page,2022-11-01 13:07:23
u002,e7,home_page,2022-11-01 08:18:12
u002,e8,list_detail_page,2022-11-01 23:34:29
u002,e1,view_detail_page,2022-11-01 11:25:32
u002,e2,add_bag_page,2022-11-01 12:41:21
u002,e3,collect_goods_page,2022-11-01 16:21:15
u002,e4,order_detail_page,2022-11-01 21:41:12
u003,e5,pay_detail_page,2022-11-01 01:10:21
u003,e6,click_adver_page,2022-11-01 01:11:13
u003,e7,home_page,2022-11-01 02:07:11
u001,e4,order_detail_page,2022-11-01 13:07:23
u001,e5,pay_detail_page,2022-11-01 08:18:12
u001,e6,click_adver_page,2022-11-01 23:34:29
u001,e7,home_page,2022-11-01 11:25:32
u001,e8,list_detail_page,2022-11-01 12:41:21
u001,e1,view_detail_page,2022-11-01 16:21:15
u001,e2,add_bag_page,2022-11-01 21:41:12
u003,e8,list_detail_page,2022-11-01 13:07:23
u003,e1,view_detail_page,2022-11-01 08:18:12
u003,e2,add_bag_page,2022-11-01 23:34:29
u003,e3,collect_goods_page,2022-11-01 11:25:32
u003,e4,order_detail_page,2022-11-01 12:41:21
u003,e5,pay_detail_page,2022-11-01 16:21:15
u003,e6,click_adver_page,2022-11-01 21:41:12
u004,e7,home_page,2022-11-01 01:10:21
u004,e8,list_detail_page,2022-11-01 01:11:13
u004,e1,view_detail_page,2022-11-01 02:07:11
u004,e2,add_bag_page,2022-11-01 13:07:23
u004,e3,collect_goods_page,2022-11-01 08:18:12
u004,e4,order_detail_page,2022-11-01 23:34:29
u004,e5,pay_detail_page,2022-11-01 11:25:32
u004,e6,click_adver_page,2022-11-01 12:41:21
u004,e7,home_page,2022-11-01 16:21:15
u004,e8,list_detail_page,2022-11-01 21:41:12
u005,e1,view_detail_page,2022-11-01 01:10:21
u005,e2,add_bag_page,2022-11-01 01:11:13
u005,e3,collect_goods_page,2022-11-01 02:07:11
u005,e4,order_detail_page,2022-11-01 13:07:23
u005,e5,pay_detail_page,2022-11-01 08:18:12
u005,e6,click_adver_page,2022-11-01 23:34:29
u005,e7,home_page,2022-11-01 11:25:32
u005,e8,list_detail_page,2022-11-01 12:41:21
u005,e1,view_detail_page,2022-11-01 16:21:15
u005,e2,add_bag_page,2022-11-01 21:41:12
u005,e3,collect_goods_page,2022-11-01 01:10:21
u006,e4,order_detail_page,2022-11-01 01:11:13
u006,e5,pay_detail_page,2022-11-01 02:07:11
u006,e6,click_adver_page,2022-11-01 13:07:23
u006,e7,home_page,2022-11-01 08:18:12
u006,e8,list_detail_page,2022-11-01 23:34:29
u006,e1,view_detail_page,2022-11-01 11:25:32
u006,e2,add_bag_page,2022-11-01 12:41:21
u006,e3,collect_goods_page,2022-11-01 16:21:15
u006,e4,order_detail_page,2022-11-01 21:41:12
u006,e5,pay_detail_page,2022-11-01 23:10:21
u006,e6,click_adver_page,2022-11-01 01:11:13
u007,e7,home_page,2022-11-01 02:07:11
u007,e8,list_detail_page,2022-11-01 13:07:23
u007,e1,view_detail_page,2022-11-01 08:18:12
u007,e2,add_bag_page,2022-11-01 23:34:29
u007,e3,collect_goods_page,2022-11-01 11:25:32
u007,e4,order_detail_page,2022-11-01 12:41:21
u007,e5,pay_detail_page,2022-11-01 16:21:15
u007,e6,click_adver_page,2022-11-01 21:41:12
u007,e7,home_page,2022-11-01 01:10:21
u008,e8,list_detail_page,2022-11-01 01:11:13
u008,e1,view_detail_page,2022-11-01 02:07:11
u008,e2,add_bag_page,2022-11-01 13:07:23
u008,e3,collect_goods_page,2022-11-01 08:18:12
u008,e4,order_detail_page,2022-11-01 23:34:29
u008,e5,pay_detail_page,2022-11-01 11:25:32
u008,e6,click_adver_page,2022-11-01 12:41:21
u008,e7,home_page,2022-11-01 16:21:15
u008,e8,list_detail_page,2022-11-01 21:41:12
u008,e1,view_detail_page,2022-11-01 01:10:21
u009,e2,add_bag_page,2022-11-01 01:11:13
u009,e3,collect_goods_page,2022-11-01 02:07:11
u009,e4,order_detail_page,2022-11-01 13:07:23
u009,e5,pay_detail_page,2022-11-01 08:18:12
u009,e6,click_adver_page,2022-11-01 23:34:29
u009,e7,home_page,2022-11-01 11:25:32
u009,e8,list_detail_page,2022-11-01 12:41:21
u009,e1,view_detail_page,2022-11-01 16:21:15
u009,e2,add_bag_page,2022-11-01 21:41:12
u009,e3,collect_goods_page,2022-11-01 01:10:21
u010,e4,order_detail_page,2022-11-01 01:11:13
u010,e5,pay_detail_page,2022-11-01 02:07:11
u010,e6,click_adver_page,2022-11-01 13:07:23
u010,e7,home_page,2022-11-01 08:18:12
u010,e8,list_detail_page,2022-11-01 23:34:29
u010,e5,pay_detail_page,2022-11-01 11:25:32
u010,e6,click_adver_page,2022-11-01 12:41:21
u010,e7,home_page,2022-11-01 16:21:15
u010,e8,list_detail_page,2022-11-01 21:41:12


-- 创建表
drop table if exists event_info_log;
create table event_info_log
(
user_id varchar(20),
event_id varchar(20),
event_action varchar(20),
event_time datetime
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 1;

-- 通过本地文件的方式导入数据
curl \
 -u root: \
 -H "label:event_info_log" \
 -H "column_separator:," \
 -T /root/data/event_log.txt \
 http://linux01:8040/api/test/event_info_log/_stream_load

 logical analysis:

1. First filter the user's event sequence according to the conditions defined by the funnel model, leaving only events that meet the conditions.

2. Collect the event IDs of the same person that meet the conditions into an array, sort them by time, and splice them into a string.

3. Match the spliced ​​string to the regular expression abstracted from the funnel model

method one:

--1. 先将用户的事件序列,按照漏斗模型定义的条件进行过滤,留下满足条件的事件
--2. 将同一个人的满足条件的事件ID收集到数组,按时间先后排序,拼接成字符串
--3. 将拼接好的字符串,匹配漏斗模型抽象出来的正则表达式

1.筛选时间条件,确定每个人的事件序列
select 
user_id,
max(event_ll) as event_seq  
from 
(
select 
user_id,
group_concat(event_id)over(partition by user_id order by report_date) as event_ll
from 
(
  select 
  user_id,event_id,report_date
  from event_info_log
  where event_id in ('e1','e2','e4','e5')
  and to_date(report_date) = '2022-11-01'
  order by user_id,report_date
) as temp
) as temp2
group by user_id;

+---------+------------------------+
| user_id | event_ll               |
+---------+------------------------+
| u006    | e4, e5, e1, e2, e4, e5 |
| u007    | e1, e4, e5, e2         |
| u005    | e1, e2, e5, e4, e1, e2 |
| u004    | e1, e5, e2, e4         |
| u010    | e4, e5, e5             |
| u001    | e1, e2, e5, e4, e1, e2 |
| u003    | e5, e1, e4, e5, e2     |
| u002    | e4, e5, e1, e2, e4     |
| u008    | e1, e1, e5, e2, e4     |
| u009    | e2, e5, e4, e1, e2     |
+---------+------------------------+

2.确定匹配规则模型
select
   user_id,
   '购买转化漏斗' as funnel_name ,
   case
   -- 正则匹配,先触发过e1,在触发过e2,在触发过e4,在触发过e5
   when    event_seq  rlike('e1.*e2.*e4.*e5') then 4
   -- 正则匹配,先触发过e1,在触发过e2,在触发过e4
   when    event_seq  rlike('e1.*e2.*e4') then 3
   -- 正则匹配,先触发过e1,在触发过e2
   when    event_seq  rlike('e1.*e2') then 2
   -- 正则匹配,只触发过e1
   when    event_seq  rlike('e1') then 1
   else 0 end step
from 
(
 select 
user_id,
max(event_ll) as event_seq  
from 
(
select 
user_id,
group_concat(event_id)over(partition by user_id order by report_date) as event_ll
from 
(
  select 
  user_id,event_id,report_date
  from event_info_log
  where event_id in ('e1','e2','e4','e5')
  and to_date(report_date) = '2022-11-01'
  order by user_id,report_date
) as temp
) as temp2
group by user_id
) as tmp3;

+---------+--------------------+------+
| user_id | funnel_name        | step |
+---------+--------------------+------+
| u006    | 购买转化漏斗       |    4 |
| u007    | 购买转化漏斗       |    2 |
| u005    | 购买转化漏斗       |    3 |
| u004    | 购买转化漏斗       |    3 |
| u010    | 购买转化漏斗       |    0 |
| u001    | 购买转化漏斗       |    3 |
| u003    | 购买转化漏斗       |    2 |
| u002    | 购买转化漏斗       |    3 |
| u008    | 购买转化漏斗       |    3 |
| u009    | 购买转化漏斗       |    2 |
+---------+--------------------+------+

-- 最后计算转换率
select 
  funnel_name,
  sum(if(step >= 1 ,1,0)) as step1,
  sum(if(step >= 2 ,1,0)) as step2,
  sum(if(step >= 3 ,1,0)) as step3,
  sum(if(step >= 4 ,1,0)) as step4,
  round(sum(if(step >= 2 ,1,0))/sum(if(step >= 1 ,1,0)),2) as 'step1->step2_radio',
  round(sum(if(step >= 3 ,1,0))/sum(if(step >= 2 ,1,0)),2) as 'step2->step3_radio',
  round(sum(if(step >= 4 ,1,0))/sum(if(step >= 3 ,1,0)),2) as 'step3->step4_radio'
from 
(
     select
        '购买转化漏斗' as funnel_name ,
        case
        -- 正则匹配,先触发过e1,在触发过e2,在触发过e4,在触发过e5
        when    event_seq  regexp('e1.*e2.*e4.*e5') then 4
        -- 正则匹配,先触发过e1,在触发过e2,在触发过e4
        when    event_seq  regexp('e1.*e2.*.*e4') then 3
        -- 正则匹配,先触发过e1,在触发过e2
        when    event_seq  regexp('e1.*e2') then 2
        -- 正则匹配,只触发过e1
        when    event_seq  regexp('e1') then 1
        else 0 end step
     from 
     (
        select 
        user_id,
        max(event_seq) as event_seq 
        from 
        -- 因为在doris1.1版本中还不支持数组,所以拼接字符串的时候还没办法排序
        (
        select 
        user_id,
        -- 用开窗的方式进行排序,然后在有序的按照时间升序,将事件拼接
        group_concat(concat(report_date,'_',event_id),'|')over(partition by user_id order by report_date) as event_seq
        from event_info_log 
        where to_date(report_date) = '2022-11-01'
        and event_id in('e1','e4','e5','e2')
        ) as tmp 
        group by user_id
     ) as t1 
) as t2
group by funnel_name;

+--------------------+-------+-------+-------+-------+--------------------+--------------------+--------------------+
| funnel_name        | step1 | step2 | step3 | step4 | step1->step2_radio | step2->step3_radio | step3->step4_radio |
+--------------------+-------+-------+-------+-------+--------------------+--------------------+--------------------+
| 购买转化漏斗       |     9 |     9 |     6 |     1 |                  1 |               0.67 |               0.17 |
+--------------------+-------+-------+-------+-------+--------------------+--------------------+--------------------+

 Method Two:

1.按照时间排序,将所有事件全部拿出来,拼成一个字符串

select
 user_id,max(sz)eventhing
 from(
 select
 user_id,group_concat(event_id)over(partition by user_id order by event_time asc)sz
 from
 event_info_log
 )t1
 group by user_id;
 
 +---------+--------------------------------------------+
| user_id | eventhing                                  |
+---------+--------------------------------------------+
| u006    | e6, e4, e5, e7, e1, e2, e6, e3, e4, e5, e8 |
| u007    | e7, e7, e1, e3, e4, e8, e5, e6, e2         |
| u005    | e1, e3, e2, e3, e5, e7, e8, e4, e1, e2, e6 |
| u004    | e7, e8, e1, e3, e5, e6, e2, e7, e8, e4     |
| u010    | e4, e5, e7, e5, e6, e6, e7, e8, e8         |
| u001    | e1, e2, e3, e5, e7, e8, e4, e1, e2, e6     |
| u003    | e5, e6, e7, e1, e3, e4, e8, e5, e6, e2     |
| u002    | e3, e4, e5, e7, e1, e2, e6, e3, e4, e8     |
| u008    | e1, e8, e1, e3, e5, e6, e2, e7, e8, e4     |
| u009    | e3, e2, e3, e5, e7, e8, e4, e1, e2, e6     |
+---------+--------------------------------------------+
 
 
 2.
 -- 正则匹配
 select
 "电商的漏斗模型" as funnel_name,
 sum(if(step>=1,1,0))as step1_uv,
 sum(if(step>=2,1,0))as step2_uv,
 sum(if(step>=3,1,0))as step2_uv,
 sum(if(step>=4,1,0))as step2_uv
 
 from
 (
 select
  user_id,
  case 
		when eventhing rlike('e1.*e2.*e4.*e5') then 4
		when eventhing rlike('e1.*e2.*e4') then 3
		when eventhing rlike('e1.*e2') then 2
		when eventhing rlike('e1') then 1
		else 0 end as step
 from
 (
 select
 user_id,max(sz)eventhing
 from(
 select
 user_id,group_concat(event_id)over(partition by user_id order by event_time asc)sz
 from
 event_info_log
 )t1
 group by user_id
 )t2
 )t3

 

Guess you like

Origin blog.csdn.net/m0_53400772/article/details/130956949