Article Directory
Data Analyst ---- SQL Strengthening (1)
written in front
Recently, when I was looking for a job, I found that most of the written exams for data analysts will involve SQL, but the difficulty of SQL in the written exams is not at the same level as what we usually encounter in our studies. The questions in the written exams are closer to the business. For fresh graduates It's still quite difficult (maybe I'm too good at it).
This SQL column will record the more valuable questions I encountered in interviews or brushing questions. I hope it can help you, and I hope you will like and pay more attention.
topic
Please use a sentence of SQL to extract the behavior characteristics of all users on the product. The characteristics are divided into purchased, purchased but not collected, collected but not purchased, collected and purchased
Order table : orders
collection table : favorites
final output:
Analysis of the meaning of the question:
Through the title, we can clearly know that this is a multi-table connection problem. After connecting the two tables, judge according to the content of the fields
About the knowledge points involved in multi-table query
The first step: table connection and table splicing
Through the analysis of the topic, we can find that this is a problem of full connection of two tables. Full connection can be performed directly in Oracle database, but full connection is not supported in MySQL database. We can consider splicing the contents of the two queries union all
keywords
select o.user_id, o.item_id,o.pay_time,f.fav_time
from orders o left join favorites f
on o.user_id = f.user_id and o.item_id = f.item_id
UNION ALL
select f.user_id, f.item_id,o.pay_time,f.fav_time
from orders o right join favorites f
on o.user_id = f.user_id and o.item_id = f.item_id
where o.user_id is null
Explain that the left outer join and right inner join are used in the splicing full join code, because the queried data will not be repeated, and it can be merged directly.
union all
From
the optimization point of view, the efficiency of using union all will be higher than that of union
Step 2: Create a new column and fill in the values
Through the query in the above table, we can find that the user's purchase and collection can be judged according to the payment time and collection time
Use case when
to differentiate
select distinct user_id,item_id,
case when pay_time is not null then 1 else 0 end '已购买',
case when pay_time is not null and fav_time is null then 1 else 0 end '购买未收藏',
case when pay_time is null and fav_time is not null then 1 else 0 end '收藏未购买',
case when pay_time is not null and fav_time is null then 1 else 0 end '收藏且购买'
from (
select o.user_id, o.item_id,o.pay_time,f.fav_time
from orders o left join favorites f
on o.user_id = f.user_id and o.item_id = f.item_id
UNION ALL
select f.user_id, f.item_id,o.pay_time,f.fav_time
from orders o right join favorites f
on o.user_id = f.user_id and o.item_id = f.item_id
where o.user_id is null
) tmp
order by user_id, item_id;
if
It is also possible to judge only
select distinct user_id,item_id,
if(pay_time,1,0) '已购买',
if(pay_time is not null and fav_time is null,1,0) '购买未收藏',
if(pay_time is null and fav_time is not null,1,0) '收藏未购买',
if(pay_time is not null and fav_time is null,1,0) '收藏且购买'
from (
select o.user_id, o.item_id,o.pay_time,f.fav_time
from orders o left join favorites f
on o.user_id = f.user_id and o.item_id = f.item_id
UNION ALL
select f.user_id, f.item_id,o.pay_time,f.fav_time
from orders o right join favorites f
on o.user_id = f.user_id and o.item_id = f.item_id
where o.user_id is null
) tmp
order by user_id, item_id;
Summarize
This question mainly examines multi-table query, full outer join, union
, union all
, case when
, the difficulty is that at the beginning, I don’t know how to start, I don’t know how to combine the data of the two tables, I don’t know how to add columns, and I may only know Process data in a table.
In fact, when we encounter this kind of problem, we can take it step by step to disassemble the problem, first associate the two tables, and then add columns according to the field conditions.