1 Description
Based on the business data of "Local Data Warehouse Project (1) - Detailed Process of Local Data Warehouse Construction" , this paper builds a system business data warehouse locally.
Generate business data according to the simulated sql script, and execute in sequence to generate business data.
The sql script is provided as follows
链接:https://pan.baidu.com/s/1AhLIuTNIyJ_GBD7M0b2RoA
提取码:1lm8
The generated data is as follows:
2 Import business data into data warehouse
The overall framework of the data warehouse is as follows. In the previous "Local Data Warehouse Project (1) - Detailed Process of Building a Local Data Warehouse", the overall process of data collection and analysis has been completed. The business data warehouse data here needs to use sqoop to import data from mysql to HDFS.
2.1 Install sqoop
2.1.1 Unzip and rename
tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha sqoop-1.4.6
2.1.2 Configure the SQOOP_HOME environment variable
SQOOP_HOME=/root/soft/sqoop-1.4.6
PATH=$PATH:$JAVA_HOME/bin:$SHELL_HOME:$FLUME_HOME/bin:$HIVE_HOME/bin:$KAFKA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SQOOP_HOME/bin
2.1.3 Configure sqoop-env.sh
mv sqoop-env-template.sh sqoop-env.sh
export HADOOP_COMMON_HOME=/root/soft/hadoop-2.7.2
export HADOOP_MAPRED_HOME=/root/soft/hadoop-2.7.2
export HIVE_HOME=/root/soft/hive
export ZOOKEEPER_HOME=/root/soft/zookeeper-3.4.10
export ZOOCFGDIR=/root/soft/zookeeper-3.4.10
2.1.4 Copy the jdbc driver of mysql to the lib directory of sqoop
2.1.5 Test link
bin/sqoop list-databases --connect jdbc:mysql://192.168.2.100:3306/ --username root --password 123456
The following page appears to indicate that the sqoop installation is successful
2.2 sqoop import data to HDFS
The following sqoop script can automatically import data to HDFS at regular intervals
#!/bin/bash
db_date=$2
echo $db_date
db_name=gmall
import_data() {
/root/soft/sqoop-1.4.6/bin/sqoop import \
--connect jdbc:mysql://192.168.2.100:3306/$db_name \
--username root \
--password 123456 \
--target-dir /origin_data/$db_name/db/$1/$db_date \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--query "$2"' and $CONDITIONS;' \
--null-string '\\N' \
--null-non-string '\\N'
}
import_sku_info(){
import_data "sku_info" "select
id, spu_id, price, sku_name, sku_desc, weight, tm_id,
category3_id, create_time
from sku_info where 1=1"
}
import_user_info(){
import_data "user_info" "select
id, name, birthday, gender, email, user_level,
create_time
from user_info where 1=1"
}
import_base_category1(){
import_data "base_category1" "select
id, name from base_category1 where 1=1"
}
import_base_category2(){
import_data "base_category2" "select
id, name, category1_id from base_category2 where 1=1"
}
import_base_category3(){
import_data "base_category3" "select id, name, category2_id from base_category3 where 1=1"
}
import_order_detail(){
import_data "order_detail" "select
od.id,
order_id,
user_id,
sku_id,
sku_name,
order_price,
sku_num,
o.create_time
from order_info o, order_detail od
where o.id=od.order_id
and DATE_FORMAT(create_time,'%Y-%m-%d')='$db_date'"
}
import_payment_info(){
import_data "payment_info" "select
id,
out_trade_no,
order_id,
user_id,
alipay_trade_no,
total_amount,
subject,
payment_type,
payment_time
from payment_info
where DATE_FORMAT(payment_time,'%Y-%m-%d')='$db_date'"
}
import_order_info(){
import_data "order_info" "select
id,
total_amount,
order_status,
user_id,
payment_way,
out_trade_no,
create_time,
operate_time
from order_info
where (DATE_FORMAT(create_time,'%Y-%m-%d')='$db_date' or DATE_FORMAT(operate_time,'%Y-%m-%d')='$db_date')"
}
case $1 in
"base_category1")
import_base_category1
;;
"base_category2")
import_base_category2
;;
"base_category3")
import_base_category3
;;
"order_info")
import_order_info
;;
"order_detail")
import_order_detail
;;
"sku_info")
import_sku_info
;;
"user_info")
import_user_info
;;
"payment_info")
import_payment_info
;;
"all")
import_base_category1
import_base_category2
import_base_category3
import_order_info
import_order_detail
import_sku_info
import_user_info
import_payment_info
;;
esac
Note:
①When sqoop imports data by default, convert the Null type of Mysql to 'null'
②Use \N in hive to represent the NULL type
③If you want to convert the Null type of Mysql to the type you expect when importing,
Need to use --null-string and --null-non-string
–null-string: When the string type column of mysql is null, what to use instead when importing to hive!
–null-string a: If in mysql, the current column is a string type (varchar, char), if the value of this column is NULL, when importing to hive, use a instead!
–null-non-string: When mysql’s non-string type column is null, what to use instead when importing to hive!
–null-non-string b: If the current column in mysql is not a string type (varchar, char), if the value of this column is NULL, when importing to hive, use b instead!
④ If you want to export the specified parameters as the NULL type of mysql when exporting, you need to use
--input-null-string and --input-null-non-string --input-null-string a: When hive is exported to mysql, if the value of the string type column in hive is a, export it to mysql and use NULL instead!
–input-null-non-string b:
When hive is exported to mysql, if the value of a non-string column in hive is b, export to mysql and use NULL instead!
Execute the script and import the data
3 ODS layers
3.1 Create ods table
3.1.1 Create order table
drop table if exists ods_order_info;
create external table ods_order_info (
`id` string COMMENT '订单编号',
`total_amount` decimal(10,2) COMMENT '订单金额',
`order_status` string COMMENT '订单状态',
`user_id` string COMMENT '用户id',
`payment_way` string COMMENT '支付方式',
`out_trade_no` string COMMENT '支付流水号',
`create_time` string COMMENT '创建时间',
`operate_time` string COMMENT '操作时间'
) COMMENT '订单表'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_order_info/';
3.1.2 Create order details
drop table if exists ods_order_detail;
create external table ods_order_detail(
`id` string COMMENT '订单详情编号',
`order_id` string COMMENT '订单号',
`user_id` string COMMENT '用户id',
`sku_id` string COMMENT '商品id',
`sku_name` string COMMENT '商品名称',
`order_price` string COMMENT '商品单价',
`sku_num` string COMMENT '商品数量',
`create_time` string COMMENT '创建时间'
) COMMENT '订单明细表'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_order_detail/';
3.1.3 Create commodity information table
drop table if exists ods_sku_info;
create external table ods_sku_info(
`id` string COMMENT 'skuId',
`spu_id` string COMMENT 'spuid',
`price` decimal(10,2) COMMENT '价格',
`sku_name` string COMMENT '商品名称',
`sku_desc` string COMMENT '商品描述',
`weight` string COMMENT '重量',
`tm_id` string COMMENT '品牌id',
`category3_id` string COMMENT '品类id',
`create_time` string COMMENT '创建时间'
) COMMENT '商品表'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_sku_info/';
3.1.4 Create user table
drop table if exists ods_user_info;
create external table ods_user_info(
`id` string COMMENT '用户id',
`name` string COMMENT '姓名',
`birthday` string COMMENT '生日',
`gender` string COMMENT '性别',
`email` string COMMENT '邮箱',
`user_level` string COMMENT '用户等级',
`create_time` string COMMENT '创建时间'
) COMMENT '用户信息'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_user_info/';
3.1.5 Create a commodity classification table
drop table if exists ods_base_category1;
create external table ods_base_category1(
`id` string COMMENT 'id',
`name` string COMMENT '名称'
) COMMENT '商品一级分类'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_base_category1/';
3.1.6 Create a secondary classification table for commodities
drop table if exists ods_base_category2;
create external table ods_base_category2(
`id` string COMMENT ' id',
`name` string COMMENT '名称',
category1_id string COMMENT '一级品类id'
) COMMENT '商品二级分类'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_base_category2/';
3.1.7 Create a three-level table of commodities
drop table if exists ods_base_category3;
create external table ods_base_category3(
`id` string COMMENT ' id',
`name` string COMMENT '名称',
category2_id string COMMENT '二级品类id'
) COMMENT '商品三级分类'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_base_category3/';
3.1.8 Create payment flow table
drop table if exists ods_payment_info;
create external table ods_payment_info(
`id` bigint COMMENT '编号',
`out_trade_no` string COMMENT '对外业务编号',
`order_id` string COMMENT '订单编号',
`user_id` string COMMENT '用户编号',
`alipay_trade_no` string COMMENT '支付宝交易流水编号',
`total_amount` decimal(16,2) COMMENT '支付金额',
`subject` string COMMENT '交易内容',
`payment_type` string COMMENT '支付类型',
`payment_time` string COMMENT '支付时间'
) COMMENT '支付流水表'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ods/ods_payment_info/';
3.2 Import data
load data inpath '/origin_data/gmall/db/order_info/2023-01-04' OVERWRITE into table gmall.ods_order_info partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/order_info/2023-01-05' OVERWRITE into table gmall.ods_order_info partition(dt='2023-01-05');
load data inpath '/origin_data/gmall/db/order_detail/2023-01-04' OVERWRITE into table gmall.ods_order_detail partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/order_detail/2023-01-05' OVERWRITE into table gmall.ods_order_detail partition(dt='2023-01-05');
load data inpath '/origin_data/gmall/db/sku_info/2023-01-04' OVERWRITE into table gmall.ods_sku_info partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/sku_info/2023-01-05' OVERWRITE into table gmall.ods_sku_info partition(dt='2023-01-05');
load data inpath '/origin_data/gmall/db/user_info/2023-01-04' OVERWRITE into table gmall.ods_user_info partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/user_info/2023-01-05' OVERWRITE into table gmall.ods_user_info partition(dt='2023-01-05');
load data inpath '/origin_data/gmall/db/payment_info/2023-01-04' OVERWRITE into table gmall.ods_payment_info partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/payment_info/2023-01-05' OVERWRITE into table gmall.ods_payment_info partition(dt='2023-01-05');
load data inpath '/origin_data/gmall/db/base_category1/2023-01-04' OVERWRITE into table gmall.ods_base_category1 partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/base_category1/2023-01-05' OVERWRITE into table gmall.ods_base_category1 partition(dt='2023-01-05');
load data inpath '/origin_data/gmall/db/base_category2/2023-01-04' OVERWRITE into table gmall.ods_base_category2 partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/base_category2/2023-01-05' OVERWRITE into table gmall.ods_base_category2 partition(dt='2023-01-05');
load data inpath '/origin_data/gmall/db/base_category3/2023-01-04' OVERWRITE into table gmall.ods_base_category3 partition(dt='2023-01-04');
load data inpath '/origin_data/gmall/db/base_category3/2023-01-05' OVERWRITE into table gmall.ods_base_category3 partition(dt='2023-01-05');
The above can be written as a script, with the date as the parameter, and it can be executed regularly every day.
4 DWD layers
4.1 Create dwd schedule
4.1.1 Create order table
drop table if exists dwd_order_info;
create external table dwd_order_info (
`id` string COMMENT '',
`total_amount` decimal(10,2) COMMENT '',
`order_status` string COMMENT ' 1 2 3 4 5',
`user_id` string COMMENT 'id',
`payment_way` string COMMENT '',
`out_trade_no` string COMMENT '',
`create_time` string COMMENT '',
`operate_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/wavehouse/gmall/dwd/dwd_order_info/'
tblproperties ("parquet.compression"="snappy");
4.1.2 Create order details table
drop table if exists dwd_order_detail;
create external table dwd_order_detail(
`id` string COMMENT '',
`order_id` decimal(10,2) COMMENT '',
`user_id` string COMMENT 'id',
`sku_id` string COMMENT 'id',
`sku_name` string COMMENT '',
`order_price` string COMMENT '',
`sku_num` string COMMENT '',
`create_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/wavehouse/gmall/dwd/dwd_order_detail/'
tblproperties ("parquet.compression"="snappy");
4.1.3 Create user table
drop table if exists dwd_user_info;
create external table dwd_user_info(
`id` string COMMENT 'id',
`name` string COMMENT '',
`birthday` string COMMENT '',
`gender` string COMMENT '',
`email` string COMMENT '',
`user_level` string COMMENT '',
`create_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/wavehouse/gmall/dwd/dwd_user_info/'
tblproperties ("parquet.compression"="snappy");
4.1.4 Create payment flow table
drop table if exists dwd_payment_info;
create external table dwd_payment_info(
`id` bigint COMMENT '',
`out_trade_no` string COMMENT '',
`order_id` string COMMENT '',
`user_id` string COMMENT '',
`alipay_trade_no` string COMMENT '',
`total_amount` decimal(16,2) COMMENT '',
`subject` string COMMENT '',
`payment_tpe` string COMMENT '',
`payment_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/wavehouse/gmall/dwd/dwd_payment_info/'
tblproperties ("parquet.compression"="snappy");
4.1.5 Create a commodity classification table
drop table if exists dwd_sku_info;
create external table dwd_sku_info(
`id` string COMMENT 'skuId',
`spu_id` string COMMENT 'spuid',
`price` decimal(10,2) COMMENT '',
`sku_name` string COMMENT '',
`sku_desc` string COMMENT '',
`weight` string COMMENT '',
`tm_id` string COMMENT 'id',
`category3_id` string COMMENT '1id',
`category2_id` string COMMENT '2id',
`category1_id` string COMMENT '3id',
`category3_name` string COMMENT '3',
`category2_name` string COMMENT '2',
`category1_name` string COMMENT '1',
`create_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/wavehouse/gmall/dwd/dwd_sku_info/'
tblproperties ("parquet.compression"="snappy");
4.2 Import data
#!/bin/bash
# 定义变量方便修改
APP=gmall
hive=/root/soft/hive/bin/hive
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table "$APP".dwd_order_info partition(dt)
select * from "$APP".ods_order_info
where dt='$do_date' and id is not null;
insert overwrite table "$APP".dwd_order_detail partition(dt)
select * from "$APP".ods_order_detail
where dt='$do_date' and id is not null;
insert overwrite table "$APP".dwd_user_info partition(dt)
select * from "$APP".ods_user_info
where dt='$do_date' and id is not null;
insert overwrite table "$APP".dwd_payment_info partition(dt)
select * from "$APP".ods_payment_info
where dt='$do_date' and id is not null;
insert overwrite table "$APP".dwd_sku_info partition(dt)
select
sku.id,
sku.spu_id,
sku.price,
sku.sku_name,
sku.sku_desc,
sku.weight,
sku.tm_id,
sku.category3_id,
c2.id category2_id,
c1.id category1_id,
c3.name category3_name,
c2.name category2_name,
c1.name category1_name,
sku.create_time,
sku.dt
from
"$APP".ods_sku_info sku
join "$APP".ods_base_category3 c3 on sku.category3_id=c3.id
join "$APP".ods_base_category2 c2 on c3.category2_id=c2.id
join "$APP".ods_base_category1 c1 on c2.category1_id=c1.id
where sku.dt='$do_date' and c2.dt='$do_date'
and c3.dt='$do_date' and c1.dt='$do_date'
and sku.id is not null;
"
$hive -e "$sql"
5 dws layers
5.1 User Behavior Wide Table
The demand goal is to aggregate the behavior of each user in a single day to form a multi-column wide table, so that the statistical analysis from different angles can be performed after correlating user dimension information.
5.1.1 Create a user behavior wide table
drop table if exists dws_user_action;
create external table dws_user_action
(
user_id string comment '用户 id',
order_count bigint comment '下单次数 ',
order_amount decimal(16,2) comment '下单金额 ',
payment_count bigint comment '支付次数',
payment_amount decimal(16,2) comment '支付金额 ',
comment_count bigint comment '评论次数'
) COMMENT '每日用户行为宽表'
PARTITIONED BY (`dt` string)
stored as parquet
location '/wavehouse/gmall/dws/dws_user_action/';
5.1.2 Import data
with
tmp_order as
(
select
user_id,
count(*) order_count,
sum(oi.total_amount) order_amount
from dwd_order_info oi
where date_format(oi.create_time,'yyyy-MM-dd')='2023-01-04'
group by user_id
) ,
tmp_payment as
(
select
user_id,
sum(pi.total_amount) payment_amount,
count(*) payment_count
from dwd_payment_info pi
where date_format(pi.payment_time,'yyyy-MM-dd')='2023-01-04'
group by user_id
),
tmp_comment as
(
select
user_id,
count(*) comment_count
from dwd_comment_log c
where date_format(c.dt,'yyyy-MM-dd')='2023-01-04'
group by user_id
)
insert overwrite table dws_user_action partition(dt='2023-01-04')
select
user_actions.user_id,
sum(user_actions.order_count),
sum(user_actions.order_amount),
sum(user_actions.payment_count),
sum(user_actions.payment_amount),
sum(user_actions.comment_count)
from
(
select
user_id,
order_count,
order_amount,
0 payment_count,
0 payment_amount,
0 comment_count
from tmp_order
union all
select
user_id,
0,
0,
payment_count,
payment_amount,
0
from tmp_payment
union all
select
user_id,
0,
0,
0,
0,
comment_count
from tmp_comment
) user_actions
group by user_id;
6 needs
6.1 Requirement 1
Find the total turnover of GMV.
GMV refers to the total turnover (such as one day, one week, one month) within a certain period of time.
drop table if exists ads_gmv_sum_day;
create external table ads_gmv_sum_day(
`dt` string COMMENT '统计日期',
`gmv_count` bigint COMMENT '当日gmv订单个数',
`gmv_amount` decimal(16,2) COMMENT '当日gmv订单总金额',
`gmv_payment` decimal(16,2) COMMENT '当日支付金额'
) COMMENT 'GMV'
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ads/ads_gmv_sum_day/';
insert data
INSERT INTO TABLE ads_gmv_sum_day
SELECT
'2023-01-04' dt,
sum(order_count) gmv_count,
sum(order_amount) gmv_amount,
sum(payment_amount) gmv_payment
FROM
dws_user_action
WHERE dt='2023-01-04'
GROUP BY dt;
6.2 Requirement 2
User freshness and funnel analysis for conversion rate
6.2.1 Ratio of new users in ADS layer to daily active users (user freshness)
build table
drop table if exists ads_user_convert_day;
create external table ads_user_convert_day(
`dt` string COMMENT '统计日期',
`uv_m_count` bigint COMMENT '当日活跃设备',
`new_m_count` bigint COMMENT '当日新增设备',
`new_m_ratio` decimal(10,2) COMMENT '当日新增占日活的比率'
) COMMENT '转化率'
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ads/ads_user_convert_day/';
data import
insert into table ads_user_convert_day
select
'2023-01-04' dt,
sum(uc.dc) sum_dc,
sum(uc.nmc) sum_nmc,
cast(sum( uc.nmc)/sum( uc.dc)*100 as decimal(10,2)) new_m_ratio
from
(
select
day_count dc,
0 nmc
from ads_uv_count
where dt='2023-01-04'
union all
select
0 dc,
new_mid_count nmc
from ads_new_mid_count
where create_date='2023-01-04'
)uc;
6.2.2 User behavior funnel analysis of ADS layer
create table
drop table if exists ads_user_action_convert_day;
create external table ads_user_action_convert_day(
`dt` string COMMENT '统计日期',
`total_visitor_m_count` bigint COMMENT '总访问人数',
`order_u_count` bigint COMMENT '下单人数',
`visitor2order_convert_ratio` decimal(10,2) COMMENT '访问到下单转化率',
`payment_u_count` bigint COMMENT '支付人数',
`order2payment_convert_ratio` decimal(10,2) COMMENT '下单到支付的转化率'
) COMMENT '用户行为漏斗分析'
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ads/ads_user_action_convert_day/';
insert data
insert into table ads_user_action_convert_day
select
'2023-01-04',
uv.day_count,
ua.order_count,
cast(ua.order_count/uv.day_count as decimal(10,2)) visitor2order_convert_ratio,
ua.payment_count,
cast(ua.payment_count/ua.order_count as decimal(10,2)) order2payment_convert_ratio
from
(
select
dt,
sum(if(order_count>0,1,0)) order_count,
sum(if(payment_count>0,1,0)) payment_count
from dws_user_action
where dt='2023-01-04'
group by dt
)ua join ads_uv_count uv on uv.dt=ua.dt;
6.3 Requirement 3
Brand repurchase rate
Demand: Statistics in units of months, users who purchased more than 2 times
6.3.1 Create table in DWS layer
drop table if exists dws_sale_detail_daycount;
create external table dws_sale_detail_daycount
(
user_id string comment '用户 id',
sku_id string comment '商品 Id',
user_gender string comment '用户性别',
user_age string comment '用户年龄',
user_level string comment '用户等级',
order_price decimal(10,2) comment '商品价格',
sku_name string comment '商品名称',
sku_tm_id string comment '品牌id',
sku_category3_id string comment '商品三级品类id',
sku_category2_id string comment '商品二级品类id',
sku_category1_id string comment '商品一级品类id',
sku_category3_name string comment '商品三级品类名称',
sku_category2_name string comment '商品二级品类名称',
sku_category1_name string comment '商品一级品类名称',
spu_id string comment '商品 spu',
sku_num int comment '购买个数',
order_count string comment '当日下单单数',
order_amount string comment '当日下单金额'
) COMMENT '用户购买商品明细表'
PARTITIONED BY (`dt` string)
stored as parquet
location '/wavehouse/gmall/dws/dws_user_sale_detail_daycount/'
tblproperties ("parquet.compression"="snappy");
data import
with
tmp_detail as
(
select
user_id,
sku_id,
sum(sku_num) sku_num,
count(*) order_count,
sum(od.order_price*sku_num) order_amount
from dwd_order_detail od
where od.dt='2023-01-05'
group by user_id, sku_id
)
insert overwrite table dws_sale_detail_daycount partition(dt='2023-01-05')
select
tmp_detail.user_id,
tmp_detail.sku_id,
u.gender,
months_between('2023-01-05', u.birthday)/12 age,
u.user_level,
price,
sku_name,
tm_id,
category3_id,
category2_id,
category1_id,
category3_name,
category2_name,
category1_name,
spu_id,
tmp_detail.sku_num,
tmp_detail.order_count,
tmp_detail.order_amount
from tmp_detail
left join dwd_user_info u on tmp_detail.user_id =u.id and u.dt='2023-01-05'
left join dwd_sku_info s on tmp_detail.sku_id =s.id and s.dt='2023-01-05';
6.3.2 ods layer
build table
drop table ads_sale_tm_category1_stat_mn;
create external table ads_sale_tm_category1_stat_mn
(
tm_id string comment '品牌id',
category1_id string comment '1级品类id ',
category1_name string comment '1级品类名称 ',
buycount bigint comment '购买人数',
buy_twice_last bigint comment '两次以上购买人数',
buy_twice_last_ratio decimal(10,2) comment '单次复购率',
buy_3times_last bigint comment '三次以上购买人数',
buy_3times_last_ratio decimal(10,2) comment '多次复购率',
stat_mn string comment '统计月份',
stat_date string comment '统计日期'
) COMMENT '复购率统计'
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ads/ads_sale_tm_category1_stat_mn/';
insert data
insert into table ads_sale_tm_category1_stat_mn
select
mn.sku_tm_id,
mn.sku_category1_id,
mn.sku_category1_name,
sum(if(mn.order_count>=1,1,0)) buycount,
sum(if(mn.order_count>=2,1,0)) buyTwiceLast,
sum(if(mn.order_count>=2,1,0))/sum( if(mn.order_count>=1,1,0)) buyTwiceLastRatio,
sum(if(mn.order_count>=3,1,0)) buy3timeLast ,
sum(if(mn.order_count>=3,1,0))/sum( if(mn.order_count>=1,1,0)) buy3timeLastRatio ,
date_format('2023-01-04' ,'yyyy-MM') stat_mn,
'2023-01-04' stat_date
from
(
select
user_id,
sd.sku_tm_id,
sd.sku_category1_id,
sd.sku_category1_name,
sum(order_count) order_count
from dws_sale_detail_daycount sd
where date_format(dt,'yyyy-MM')=date_format('2023-01-04' ,'yyyy-MM')
group by user_id, sd.sku_tm_id, sd.sku_category1_id, sd.sku_category1_name
) mn
group by mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name;
6.4 Requirement 4
Ranking of the top ten products with repurchase rate corresponding to each user level
Create a table
drop table ads_ul_rep_ratio;
create table ads_ul_rep_ratio(
user_level string comment '用户等级' ,
sku_id string comment '商品id',
buy_count bigint comment '购买总人数',
buy_twice_count bigint comment '两次购买总数',
buy_twice_rate decimal(10,2) comment '二次复购率',
rank string comment '排名' ,
state_date string comment '统计日期'
) COMMENT '复购率统计'
row format delimited fields terminated by '\t'
location '/wavehouse/gmall/ads/ads_ul_rep_ratio/';
insert data
with
tmp_count as(
select -- 每个等级内每个用户对每个产品的下单次数
user_level,
user_id,
sku_id,
sum(order_count) order_count
from dws_sale_detail_daycount
where dt<='2023-01-04'
group by user_level, user_id, sku_id
)
insert overwrite table ads_ul_rep_ratio
select
*
from(
select
user_level,
sku_id,
sum(if(order_count >=1, 1, 0)) buy_count,
sum(if(order_count >=2, 1, 0)) buy_twice_count,
sum(if(order_count >=2, 1, 0)) / sum(if(order_count >=1, 1, 0)) * 100 buy_twice_rate,
row_number() over(partition by user_level order by sum(if(order_count >=2, 1, 0)) / sum(if(order_count >=1, 1, 0)) desc) rn,
'2023-01-04'
from tmp_count
group by user_level, sku_id
) t1
where rn<=10
Next is the data visualization and task scheduling of the local data warehouse project. For details, see "Local Data Warehouse Project (3) - Data Visualization and Task Scheduling"