2023 May Cup (Question B) Full analysis of the modeling code for express delivery demand analysis problems

Question B——Express delivery demand analysis problem

Problem restatement

In the background of the question, the impact of online shopping on express transportation demand has been clearly stated. For express companies, accurately predicting the quantity of express transportation needs is of great significance because it helps the company better layout warehouse sites, save storage costs and plan transportation routes. The question provides Attachments 1, 2, and 3. These attachments contain express transportation data between some cities recorded by a domestic express company, including the delivery date, delivery city, and receiving city (city names are alphabetized) Instead, data for June, November, and December were eliminated). Based on these attached data, a mathematical model is required to solve related problems.

Question one

In order to solve this problem, we need to evaluate site cities from multiple perspectives, including receipt volume, shipment volume, express delivery volume growth/decrease trends and correlations, etc. We can build a mathematical model through the following steps:

Data preprocessing : First, extract and organize the data from Appendix 1, including shipping date, shipping city and receiving city. For each city, calculate its total receipts and total shipments.
Calculate the growth/decrease trend : Use time series analysis methods (such as exponential smoothing method , ARIMA model, etc.) to analyze the growth/decrease trend of express delivery quantity in each city.
Calculate correlation: Calculate the correlation between cities based on the receipt and delivery data between cities, such as using the Pearson correlation coefficient . This helps reveal how connected cities are.
Assign weights to each indicator: According to the actual situation, allocate weights to indicators such as receipt volume, shipment volume, growth/decrease trends, and correlation, so that the evaluation results meet actual needs.
Calculate the comprehensive score: Combine the weight of each indicator to calculate the comprehensive score of each city. The higher the score, the more important the city is.
Comprehensive ranking: Sort cities according to the comprehensive score of each city. Finally, list the city names of the top 5 most important sites.

It should be noted that model parameters and weights may need to be adjusted according to actual conditions. To get more accurate results, try using methods like cross-validation to evaluate model performance.

Sample code

Use Python and pandas libraries for data processing and calculations. Please install and use them yourself. Baidu Python is much more convenient than Matlab when processing big data due to the large number of third-party libraries.

import pandas as pd
 import numpy as np
 from scipy.stats import pearsonr
 
 # 假设数据已经以CSV格式存储，这里我们从CSV文件中读取数据
 data = pd.read_csv('attachment1.csv')
 
 # 计算每个城市的总收货量和总发货量
 total_receiving = data.groupby('Receiving city')['Express delivery quantity (PCS)'].sum()
 total_delivering = data.groupby('Delivering city')['Express delivery quantity (PCS)'].sum()
 
 # 创建一个包含所有城市的列表
 cities = list(set(data['Receiving city'].unique()) | set(data['Delivering city'].unique()))
 
 # 计算每个城市的快递数量增长/减少趋势
 trends = {}
 for city in cities:
     city_data = data[(data['Receiving city'] == city) | (data['Delivering city'] == city)]
     city_data['Date'] = pd.to_datetime(city_data['Date Y/M/D'])
     city_data = city_data.sort_values(by='Date')
     city_data['Express delivery quantity (PCS)'] = city_data['Express delivery quantity (PCS)'].rolling(window=30).mean()
     trend = (city_data['Express delivery quantity (PCS)'].iloc[-1] - city_data['Express delivery quantity (PCS)'].iloc[0]) / len(city_data)
     trends[city] = trend
 
 # 计算城市间的相关性
 correlations = {}
 for city1 in cities:
     for city2 in cities:
         if city1 == city2:
             continue
         city1_data = data[data['Delivering city'] == city1]
         city2_data = data[data['Delivering city'] == city2]
         if len(city1_data) != len(city2_data):
             continue
         corr, _ = pearsonr(city1_data['Express delivery quantity (PCS)'], city2_data['Express delivery quantity (PCS)'])
         correlations[f'{city1}-{city2}'] = corr
 
 # 为每个指标分配权重
 weight_receiving = 0.25
 weight_delivering = 0.25
 weight_trend = 0.25
 weight_correlation = 0.25
 
 # 计算每个城市的综合得分
 scores = {}
 for city in cities:
     receiving_score = total_receiving.get(city, 0)
     delivering_score = total_delivering.get(city, 0)
     trend_score = trends.get(city, 0)
     correlation_score = np.mean([correlations.get(f'{city}-{c}', 0) for c in cities if c != city])
     
     total_score = (receiving_score * weight_receiving +
                    delivering_score * weight_delivering +
                    trend_score * weight_trend +
                    correlation_score * weight_correlation)
     scores[city] = total_score
 
 # 对城市进行排序
 sorted_cities = sorted(scores, key=scores.get, reverse=True)
 
 # 输出排名前5的城市
 print("Top 5 cities: ", sorted_cities[:5])

This sample code includes steps such as data processing, calculating growth trends, calculating correlations, and calculating overall scores. Please adjust parameters and weights based on actual data. If it cannot run, please pay attention to the path, name, etc. of your input file~

More details can be found here:

2023 May Cup (Question B) Full analysis of the modeling code for express delivery demand analysis problems!