Graphical interview questions: Didi’s real job-hunting questions, after learning it, you can easily win offers from various companies

c75d56fa8071a94eaf9397eb3cfb6861.jpeg

【topic】

The "Order Information Form" records the information of Brazilian passengers using taxi-hailing software, including order call, answer, cancellation, and order completion time. (Drip pen test questions)

0102e4de0c203a2e72126b392d9f2ebc.jpeg

30ad11c1027fd4fd2a95aa43edd27d85.jpeg

Notice:

(1) The time in the table is Beijing time, and Brazil is 11 hours behind China.

(2) If the data value in the response time column is "1970", it means that the order has no driver's response and it is an invalid order.

827b28c88c08de14cd44576374bee953.png

question

1. What are the order response rate and order completion rate?

2. How long is the call answer time?

3. From this week's data, which hour (local time) has the highest call volume? What hour (local time) has the least call volume?

4. What is the proportion of call orders that continue to call the next day?

5. (Optional) If you want to classify passengers, what factors do you think need to be considered?

【Problem solving steps】

We first preprocess the data to convert Beijing time to Brazil time. Specifically, it needs to be implemented in two steps. First, in order to ensure that the time in the table is in a standard date format, we uniformly process the date format. Then convert the processed date to Brazilian time.

e94ba85ec6a839de57c3f3f2d1f39a61.png

(1) Date formatting

Since in date formatting, we will need to modify the date data in the table, so consider using the update statement. The specific operation of modifying the table will involve conversion between date data types, so we consider using the cast function.

86d430c6b15e7e2c9eadb5f74040804d.png

34346f6c5f827f0e7fab117aacdf3801.png

Since the time in the table should be in datetime format, it is accurate to hour, minute and second (YYYY-MM-DD HH:mm:ss). The effect after conversion is shown in the figure below.

550dcb9d8ec3765a57bd01062b5c109d.png

Therefore, the following sql statements can be written.

update 订单信息表 set call_time=cast(call_time as datetime);


update 订单信息表 set grab_time=cast(grab_time as datetime);


update 订单信息表 set cancel_time=cast(cancel_time as datetime);


update 订单信息表 set finish_time=cast(finish_time as datetime);

The date formatted table is as shown below.

c5dd488335f375e4890e9efc047e07fa.png

(2) Convert to Brazil time

Since the time in the data is Beijing time, and it is known that Brazil is 11 hours behind China, we use the date_sub function here.

28100a877e8fc90b2365efd4c775b5dc.jpeg

So you can write the following sql statement:

update 订单信息表 
set call_time= date_sub(call_time, interval 11 hour) ;


update 订单信息表 
set grab_time= date_sub(grab_time, interval 11 hour) ;


update 订单信息表 
set cancel_time= date_sub(cancel_time, interval 11 hour) ;


update 订单信息表
set finish_time= date_sub(finish_time, interval 11 hour) ;

The time conversion result is as follows:

82243e789fe3d416deb6f1c8b2f0da34.png

According to the above operations, the data date preprocessing is completed.

1. What are the order response rate and order completion rate?

(1) Response rate

Response rate = Number of answered orders/Number of called orders 

247515da3aa9f6435ce202a2f34a1299.png

Call order:  The number of call orders is equal to the total number of data in the column of call time (call_time), which can be summarized by count(call_time).

Response order: The number of response orders is equal to the total data in the response time (grab_time) column, which can be summarized by count(grab_time). It should be noted that the number of data whose value in this column is not equal to '1970' is the effective number of response orders. As shown in the figure below: the part in the red box is the response order.

21ebd5ef20851564c4abc4d48863609c.png

According to the business requirements of the topic, different conditions need to be counted. In "Monkey Learns SQL from Zero", it is said that the condition judgment should use the case when expression. So the sql corresponding to the number of response orders is:

sum(case when grab_time <> 1970 then 1 else 0 end)

Now you can calculate the indicator answer rate = number of answered orders/number of called orders:

select sum(case when grab_time <> 1970 then 1 else 0 end)/count(call_time) as 应答率
from 订单信息表;

The query results are as follows:

90f51d443726e63528ed1c972c20d87d.png

(2) Order Completion Rate

Order Completion Rate = Number of Completed Orders/Number of Called Orders

1ad767dc7b15f2eac005d18f49842467.png

Complete order:  In the column of finish time (finish_time), the number of data whose value is not equal to '1970' is the effective number of completed orders. As shown in the picture below: the part in the red box is the completed order.

4f6c767050f133d8bbc8af0552599a82.png

So the number of completed orders is:

sum(case when finish_time <> 1970 then 1 else 0 end)

Now you can calculate the index completion rate = number of completed orders / number of call orders:

select sum(case when finish_time <> 1970 then 1 else 0 end)/count(*) as 完单率
from 订单信息表;

The query results are as follows

01e113a7a1eb0ff859a37fdfa0ed5938.png

2. How long is the call answer time?

According to the definition of indicators in the title:

Call answering time = the sum of the time from the call to the answered order / the number of answered orders

The time from the call to the response of the answered order = the time to be answered (grab_time) - the time to call (call_time).

This involves calculating the difference between two dates. "Monkey Learns SQL from Zero" mentioned that the corresponding single function is timestampdiff. The figure below is the usage of this function.

0c0b7f1488f5ea3f0663c7c2b1d0c872.png

Let's go back to the topic and use the timestampdiff function to calculate the sum of the time from calling to being answered.

c9cb08a3237c5af88e58718928e4b6b6.png

In summary, the analysis of the corresponding sql statement is as follows

3534942191ec2edf9f6f4360a169c510.png

The query results are as follows

ac9fe66e14ee890e3df844d1b6f2a03d.png

3. From this week's data, which hour (local time) has the highest call volume? What hour (local time) has the least call volume?

(1) Time conversion

Since the title requires "which hour", we first format and convert the data into hours. Add a new column to represent the "hour" in the time, and the column name is set to call_time_hour.

-- 添加列
alter table 订单信息表 add column call_time_hour varchar(255);

Using the date_format function, which is used to display date data in different formats, will convert the data format into hours.

/** 
给列添加数据
%k表示显示的是24小时制中的小时
*/
update 订单信息表
set call_time_hour=date_format(call_time,'%k');

The converted table is as shown below

221e279f40727540fd3c2fbc2bfe52b8.png

(2) Which hour has the highest call volume?

Calling order is the order_id column. Group by "each hour" (group by call_time_hour), then count the call order count (order_id) for each hour, and then sort to know which hour has the highest order volume.

The following figure shows the SQL statement analysis process:

8530749bb349f114b101852a00f261ff.png

At this time, the query results are as follows

9ed6031b770d4589aa3993495d722208.png

Because the title requires the maximum value after sorting (the hour with the highest call volume), you can use the limit clause to filter out the first row of data.

a9774aeb369c0ae166a8be5ca202fcff.png

The sql statement is as follows:

select call_time_hour,count(order_id) as 最大次数
from 订单信息表
group by call_time_hour
order by 最大次数 desc 
limit 1;

(2) Which hour has the least number of calls?

Following the above sorting results, we can see that the data of 3 call hours is the minimum number of times, and we can filter them out with limit 3.

7de3c06b8547e37eb9606f55101c9109.png

select call_time_hour,count(order_id) as 最小次数
from 订单信息表
group by call_time_hour
order by 最小次数 asc 
limit 3;

4. What is the proportion of call orders that continue to call the next day?

The proportion of call orders that continue to call the next day = the number of users who continue to call the next day/total call orders.

The idea of ​​​​calculating the number of users who continue to call the next day is as follows:

8127dd0291a3fb90be6d5d49a38016d0.png

Let's analyze each part in detail.

(1) Self-associated query to obtain the time interval of calls. Since we need the unit of time to be days, we use the date_format function to extract the "year month day" part of the date.

8e904ce2a5c4d3c0fd899bca576ee4f7.png

The sql statement is as follows:

-- 添加一列来显示时间中的“年月日”部分
alter table 订单信息表 add column call_time_day varchar(255);
update 订单信息表
set call_time_day=date_format(call_time,'%Y-%m-%d');

The changed table at this time is as follows:

166dfebe50555b6f7dba6059264cdb96.png

We then use the join of the tables to calculate the number of days apart. Here, since it involves calculating the difference in the number of days apart, we use the timestampdiff function mentioned above. The unit is day.

77dc0b3d1fd927cbd11ed8c931a81276.png

At this time, the query results are as follows

ef5dd6de8608c4f65b0670b75fbf1adf.png

Filter out data with a time difference of 1 day, that is, data with interval=1.

99f85dd486a8f117f2722ae0b75630c2.png

Using subquery nesting, use the above query results as a new table, filter in it, and sum. The sql statement analysis is as shown in the figure below.

a523587182da3332b3b7439eaacf90b3.png

At this time, the query results are as follows

113f339cbf202c4a2dd24d5e6dbecfaa.png

Finally, we calculate the proportion of continuing calls the next day

583d4e589986610ccb075956af8cdddd.png

The query results are as shown in the figure below

5676c7174ee84017b23f5e0b0f777ef1.png

5. (Optional) If you want to classify the passengers in the table, what factors do you think need to be considered?

We can consider user classification from the following two perspectives.

234d4cea9acaf5e64cba50f48b350e9f.png

User Behavior Classification

1) According to the completion time and order receiving time, the time spent by passengers during the ride can be roughly calculated, and this time can be predicted as long-distance, mid-distance or short-distance to analyze passengers' riding habits.

2) According to the call time, it can be judged at the time when the passenger issued the ticket, how the passenger demand was generated, and in which scenarios the user has a travel demand, such as commuting, commuting, dining, traveling, temporary and other scenarios.

User Value Classification

Use the RFM analysis method learned before to classify users by value.

2d1d1eade0ccd469e0befbaf6d9b644a.jpeg

Specific to this question, RFM can be defined as follows:

R:最近一次乘客的完单时间。
F:乘客打车的频率。
M:打车消费的金额。此处可以用乘车过程消耗的时长来代替等。

[Test points for this question]

1. For the processing of date data, master the common date processing methods mentioned in the topic.

2. Examine analytical thinking ability. Solve using the framework you have learned how to use data analysis to solve problems.

2ecc741ce9c7d5064f8748d3bda800af.jpeg ⬇️Click "Read the original text"

 Sign up for free Data analysis training camp

Guess you like

Origin blog.csdn.net/zhongyangzhong/article/details/129630899