SQL recommended product problem

Table of contents

0 needs

1 build table

2 Data analysis

3 Summary


0 needs

Given a record of a user purchasing an item, return each item that the user might want to purchase. If other users and this user purchase at least two identical commodities, then the commodity purchased by other users and not purchased by this user is the commodity that this user may want to purchase.

Data are as follows:

用户id、商品id
A 1
A 2
A 1
A 3
B 2
B 3
B 4
B 5
B 2
C 1
C 2
C 1
D 1
D 3
D 6

1 build table

create table product as 
select 'A' as user_id,'1' product_id
UNION ALL
select 'A' as user_id,'2' product_id
UNION ALL
select 'A' as user_id,'1' product_id
UNION ALL
select 'A' as user_id,'3' product_id
UNION ALL
select 'B' as user_id,'2' product_id
UNION ALL
select 'B' as user_id,'3' product_id
UNION ALL
select 'B' as user_id,'4' product_id
UNION ALL
select 'B' as user_id,'5' product_id
UNION ALL
select 'B' as user_id,'2' product_id
UNION ALL
select 'C' as user_id,'1' product_id
UNION ALL
select 'C' as user_id,'2' product_id
UNION ALL
select 'C' as user_id,'1' product_id
UNION ALL
select 'D' as user_id,'1' product_id
UNION ALL
select 'D' as user_id,'3' product_id
UNION ALL
select 'D' as user_id,'6' product_id

2 Data analysis

The first step is to deduplicate the data in the table, according to the user and product dimensions

with t1 as(
select user_id,product_id
from product
group by user_id,product_id
)
user_id product_id
A       1
A       2
A       3
B       5
B       4
B       3
B       2
C       2
C       1
D       6
D       1
D       3

(2) How to know that other users have purchased the same product as this user. To find out this blood relationship, it is generally self-association

with t1 as(
select user_id,product_id
from product
group by user_id,product_id
)
select a.user_id as user_id1, b.user_id as user_id2, a.product_id
from t1 a
join t1 b
on a.product_id = b.product_id
where a.user_id!=b.user_id
user_id1        user_id2        a.product_id
A       C       1
A       D       1
A       B       2
A       C       2
A       B       3
A       D       3
B       A       2
B       C       2
B       A       3
B       D       3
C       A       1
C       D       1
C       A       2
C       B       2
D       A       1
D       C       1
D       A       3
D       B       3

(3) Through step 2, all users who purchased the same product as the user can be found, and users who have purchased at least 2 identical products in pairs

with t1 as(
select user_id,product_id
from product
group by user_id,product_id
)
,t2 as
(
select a.user_id as user_id1, b.user_id as user_id2, a.product_id
from t1 a
join t1 b
on a.product_id = b.product_id
where a.user_id!=b.user_id
)
select user_id1,user_id2
from t2
group by user_id1,user_id2
having count(1) >=2
user_id1        user_id2
A       B
A       C
A       D
B       A
C       A
D       A

After step 3, the user relationship table with the same tendency of purchasing the same product more than 2 times can be obtained

(4) According to the relationship table, obtain the products purchased by the user and users with the same tendency

with t1 as(
select user_id,product_id
from product
group by user_id,product_id
)
,t2 as
(
select a.user_id as user_id1, b.user_id as user_id2, a.product_id
from t1 a
join t1 b
on a.product_id = b.product_id
where a.user_id!=b.user_id
)
,t3 as
(select user_id1,user_id2
from t2
group by user_id1,user_id2
having count(1) >=2
)
select t3.user_id1,t3.user_id2,a.product_id product_id_2
from t3
left join t1 a
on t3.user_id2 = a.user_id
t3.user_id1     t3.user_id2     product_id_2
A       B       2
A       B       3
A       B       4
A       B       5
A       C       1
A       C       2
A       D       1
A       D       3
A       D       6
B       A       1
B       A       2
B       A       3
C       A       1
C       A       2
C       A       3
D       A       1
D       A       2
D       A       3

Find out the products that the user should recommend to him (product recommendations will be repeated)

with t1 as(
select user_id,product_id
from product
group by user_id,product_id
)
,t2 as
(
select a.user_id as user_id1, b.user_id as user_id2, a.product_id
from t1 a
join t1 b
on a.product_id = b.product_id
where a.user_id!=b.user_id
)
,t3 as
(select user_id1,user_id2
from t2
group by user_id1,user_id2
having count(1) >=2
)
select user_id1,product_id_2
from
(select t3.user_id1,t3.user_id2,a.product_id product_id_2
from t3
left join t1 a
on t3.user_id2 = a.user_id
) t
group by user_id1,product_id_2
user_id1        product_id_2
A       1
A       2
A       3
A       4
A       5
A       6
B       1
B       2
B       3
C       1
C       2
C       3
D       1
D       2
D       3

(5) Calculate the difference and find out the accurately recommended products. The method of calculating the difference in hive uses left join + is null to judge and obtain

with t1 as(
select user_id,product_id
from product
group by user_id,product_id
)
,t2 as
(
select a.user_id as user_id1, b.user_id as user_id2, a.product_id
from t1 a
join t1 b
on a.product_id = b.product_id
where a.user_id!=b.user_id
)
,t3 as
(select user_id1,user_id2
from t2
group by user_id1,user_id2
having count(1) >=2
)
,t4 as
(select user_id1,product_id_2
from
(select t3.user_id1,t3.user_id2,a.product_id product_id_2
from t3
left join t1 a
on t3.user_id2 = a.user_id
) t
group by user_id1,product_id_2
) 
select t4.user_id1 as user_id,t4.product_id_2 as product_id
from t4
left join t1
on t4.user_id1 = t1.user_id
and t4.product_id_2 = t1.product_id
where t1.product_id is null
user_id product_id
A       4
A       5
A       6
B       1
C       3
D       2

3 Summary

This question mainly examines the understanding of association, and obtains results through various association transformations. Through this question, you can gain understanding: to obtain the mutual relationship between the data in the table, only self-association can be obtained; to obtain the difference set, you need to obtain it in the form of left join+is null, there is no intersection, difference, and union of arrays in hive function, so only the association can be used to get the result.

Guess you like

Origin blog.csdn.net/godlovedaniel/article/details/126798308