Practical case: Using Python machine learning to predict takeaway delivery time

The weather is getting hotter every day now. Many people will choose to order takeaway when they rest at home on weekends. After all, it is hot and sunny when they go out.

If you are too hungry and it is too late to order food, you may pay attention to where the food delivery staff delivered it and how long it will take to deliver it.

This information will be displayed on the apps of Meituan and Are you hungry. So how is this takeaway time predicted?

picture

One of the methods is to use machine learning algorithms to predict the delivery time of the food delivery based on the previous delivery information of the delivery staff.

Today I will introduce to you, based on Python machine learning to predict the delivery time of food .

In order to predict the delivery time in real time, we need to calculate the distance between the food preparation point and the food consumption point.

After finding the distance between the restaurant and the delivery location, we need to find the relationship between the time the delivery driver used to spend delivering food within the same distance.

Here I found a dataset that contains data on how long it takes a delivery person to deliver food from a restaurant to a delivery location.

It is a data set on Kaggle that contains all the features of this task. You can download the data set from the link below.

https://www.kaggle.com/datasets/gauravmalik26/food-delivery-dataset

Technology Exchange

Technology must learn to share and communicate, and it is not recommended to work behind closed doors. One person can go fast, and a group of people can go farther.

Relevant files and codes have been uploaded, and can be obtained by adding to the communication group. The group has more than 2,000 members. The best way to add notes is: source + interest direction, so that it is convenient to find like-minded friends.

Method ①, add WeChat account: dkl88194, remarks: from CSDN + add group
Method ②, WeChat search official account: Python learning and data mining, background reply: add group

data processing

First import the required Python library to read the dataset.

import pandas as pd
import numpy as np
import plotly.express as px

data = pd.read_csv("deliverytime.txt")
print(data.head())

The data set is as follows.

picture

Interpret the meaning of each field~

ID: Order ID

Delivery_person_ID: delivery person ID

Delivery_person_Age: The age of the delivery person

Delivery_person_Ratings: delivery person ratings

Restaurant_latitude: restaurant latitude

Restaurant_longitude: restaurant longitude

Delivery_location_latitude: Delivery point latitude

Delivery_location_longitude: longitude of delivery point

Type_of_order: order type

Type_of_vehicle: The vehicle type of the delivery person

Time_taken(min): Time taken by the courier for delivery

Take another look at the information in each column.

print(data.info())

The result is as follows, including the name, data type and other information of each column.

picture

Check to see if the dataset contains null values.

data.isnull().sum()

The results are as follows, you can see that the data set has no null values.

picture

The dataset only provides the latitude and longitude of the restaurant and the delivery location, so we need to calculate the distance between the two latitude and longitude.

You can use the Haversine Formula to calculate the distance between two points on Earth based on latitude and longitude.

# 设置地球的半径(千米)
R = 6371


# 将角度转换为弧度
def deg_to_rad(degrees):
    return degrees * (np.pi / 180)


# 使用半正矢公式(Haversine Formula)计算两点之间距离的
def distcalculate(lat1, lon1, lat2, lon2):
    d_lat = deg_to_rad(lat2 - lat1)
    d_lon = deg_to_rad(lon2 - lon1)
    a = np.sin(d_lat / 2) ** 2 + np.cos(deg_to_rad(lat1)) * np.cos(deg_to_rad(lat2)) * np.sin(d_lon / 2) ** 2
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a))
    return R * c


# 计算每对点之间的距离
data['distance'] = np.nan

for i in range(len(data)):
    data.loc[i, 'distance'] = distcalculate(data.loc[i, 'Restaurant_latitude'],
                                            data.loc[i, 'Restaurant_longitude'],
                                            data.loc[i, 'Delivery_location_latitude'],
                                            data.loc[i, 'Delivery_location_longitude'])

The distance between the restaurant and the delivery location has now been calculated.

A new feature, distance, is also added to the dataset.

The dataset can be viewed again.

print(data.head())

The result is as follows.

picture

relationship analysis

The next step is to study the data and find the relationship between the features.

Start with the relationship between distance and time to deliver food.

figure = px.scatter(data_frame=data,
                    x="distance",
                    y="Time_taken(min)",
                    size="Time_taken(min)",
                    trendline="ols",
                    title="Relationship Between Distance and Time Taken")
figure.show()

The result is as follows.

picture

There is a fixed relationship between the time it takes to deliver food and the distance.

This means that most couriers are able to deliver food within 25-30 minutes, no matter the distance.

So what is the relationship between the time of delivery and the age of the delivery staff?

figure = px.scatter(data_frame=data,
                    x="Delivery_person_Age",
                    y="Time_taken(min)",
                    size="Time_taken(min)",
                    color="distance",
                    trendline="ols",
                    title="Relationship Between Time Taken and Age")
figure.show()

The result is as follows.

picture

The delivery time has a linear relationship with the age of the courier.

That means younger couriers spend less time delivering food than older couriers, which is better if you're young.

The relationship between delivery time and the rating of the courier.

figure = px.scatter(data_frame=data,
                    x="Delivery_person_Ratings",
                    y="Time_taken(min)",
                    size="Time_taken(min)",
                    color="distance",
                    trendline="ols",
                    title="Relationship Between Time Taken and Ratings")
figure.show()

The result is as follows.

picture

There is an inverse linear relationship between the time it takes for a meal to be delivered and the rating of the delivery person.

This means that delivery drivers with high ratings spend less time delivering meals than those with lower ratings.

By the way, who doesn't want their takeaway to be delivered quickly, and they want to eat it as soon as they order it.

Now let's see if the type of food ordered by the customer and the type of vehicle used by the courier affects the delivery time.

fig = px.box(data,
             x="Type_of_vehicle",
             y="Time_taken(min)",
             color="Type_of_order")
fig.show()

The result is as follows.

picture

Depending on the vehicle the courier was driving and the type of food they were delivering, the time spent by the couriers didn't vary too much.

Therefore, according to the analysis, the following three characteristics have a greater impact on the delivery time.

1. The age of the delivery staff

2. Rating of delivery staff

3. Distance between restaurant and delivery location

model prediction

This time, the LSTM neural network model is used to train the machine learning model to complete the task of food delivery time prediction.

# 分类数据集
from sklearn.model_selection import train_test_split
x = np.array(data[["Delivery_person_Age",
                   "Delivery_person_Ratings",
                   "distance"]])
y = np.array(data[["Time_taken(min)"]])
xtrain, xtest, ytrain, ytest = train_test_split(x, y,
                                                test_size=0.10,
                                                random_state=42)

# 创建LSTM神经网络模型
from keras.models import Sequential
from keras.layers import Dense, LSTM
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(xtrain.shape[1], 1)))
model.add(LSTM(64, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))
model.summary()

The result is as follows.

picture

Do the training model work.

# 训练模型
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(xtrain, ytrain, batch_size=1, epochs=9)

The result is as follows.

picture

After the model is trained, you can input information to predict the delivery time.

print("Food Delivery Time Prediction")
a = int(input("Age of Delivery Partner: "))
b = float(input("Ratings of Previous Deliveries: "))
c = int(input("Total Distance: "))

features = np.array([[a, b, c]])
print("Predicted Delivery Time in Minutes = ", model.predict(features))

Test it out and the results are as follows.

picture

Input information: The deliveryman is 29 years old, with a score of 2.9 and a distance of 6km.

Get the predicted delivery time: about 42 minutes

Guess you like

Origin blog.csdn.net/qq_34160248/article/details/132019846