The 4th MathorCup College Mathematical Modeling Challenge in 2023 - Problem Solving Ideas for Problem B in the Big Data Competition

The 7-day Ma Cup Big Data Challenge has started as scheduled. In order to help you have a deeper understanding of question B, here are the preliminary ideas for solving question B.

Track B: E-commerce retail merchant demand forecast and inventory optimization issues

Since the Ma Cup competition is divided into preliminary and semi-finals, for question B, everyone only saw prediction-related questions and no optimization-related questions. Including the inventory optimization mentioned in the question, there is absolutely no need to read it for this competition. This also greatly reduced the difficulty of this competition. The following is a detailed analysis of problem-solving ideas for question B in this competition.

data! ! ! ! (Data cleaning + data visualization)

Remember, when it comes to data problems, the first step is definitely not to solve the problem, but to preprocess the data. For this question, such a huge data set must have outliers and even missing values. Therefore, based on the seven-day competition duration, everyone can take out one or two days to specifically look for outliers.

Regarding data here, I provide two ideas, which are also the two directions that have been emphasized in the course. First, for the marginal value problem. Second, logical anomalies. Marginal values ​​mainly refer to situations where the demand in the given data has a large value or 0. How should these two extreme values ​​be handled? My initial idea is to discuss these two maximum values ​​and delete the results. Then linear interpolation is used for filling.

For the value 0, it can be roughly seen from analyzing the data that there are many such minimum values, and necessary text explanations are provided. It is enough to explain that although this kind of data is abnormal data, it is consistent with the actual situation.

For logical anomalies, for example, a computer or office merchant sells pets. There is no doubt that this is abnormal data and needs to be processed. However, the difficulty with this kind of logical anomaly is that it cannot be directly seen. It requires you to search carefully or set constraints on the find function to search, which is more complicated.

This is roughly the data cleaning for data preprocessing. There is also a part of data encoding processing, that is, for merchant encoding as an example, we need to perform subsequent processing on these converted data. Here we need to set the data encoding method. Usually the default method is to perform encoding in sequence, as shown below . You can use SPSSPRO to quickly generate it here. There will also be explanations in the video later.

Initial thoughts on the problem

After the data processing is almost complete, the problem can be solved. The following is a preliminary idea for questions 1, 2 and 3.

For question one, use the data in Appendix 1-4 to predict the demand for each merchant's products in each warehouse from 2023-05-16 to 2023-05-30 and evaluate the prediction performance of your model.

According to the data analysis and modeling process, how to classify these time series formed by merchants, warehouses, and commodities so that the demand characteristics of the same category are most similar?

Question 1 can be understood as two questions, or can be solved in one way. Question 1 requires us to make predictions and how to classify the time series formed by merchants, warehouses, and commodities. By analyzing the data, we can see that there is a demand for 1996 product combinations every day. For 1996 different combinations, it is impossible for us to predict every one, that is, the established prediction model requires a for loop 1996. With this arrangement, it is difficult to complete the code even if it takes seven days. Therefore, we must classify based on some similar characteristics and combine the same category so that the characteristics of the same category in terms of requirements are the most similar. Make predictions based on different categories. This can greatly reduce the workload of prediction.

I think you can use the correlation analysis model. This model was explained in the fifth lesson of the course. You can choose the free version of the course, the advanced version of the course, or learn on your own online. Here, my suggestion is to directly use person correlation analysis, select the correlation coefficient with demand for classification, and perform classification modeling. (Note: You can also choose an advanced classification model. There are also many advanced methods of classification and discrimination in the information given to you. You can use those advanced methods for classification)

After selecting the appropriate indicators, you can carry out the merchant code, product code, warehouse code, date, and shipment volume (which can be regarded as demand). Here, you must memorize certain mechanism analysis before making predictions. As we all know, the four data for which the results are to be obtained are not independent, but there is a certain relationship between them. Therefore, correlation analysis can be performed on these four indicators, and after the specific function expression is obtained, prediction can be made.

For the analysis of the mechanism, you can draw scatter plots, correlation analysis, linear or partial linear fitting.

Analyze the mechanism by drawing such a graph and construct the relationship equation between several predicted values.

For the selection of prediction models, you can choose the appropriate prediction model according to your own ability. You can refer to the following table.

You can also choose the weighted prediction model based on the optimization model that I have always recommended.

Predictions are made based on the relationship equations derived from the mechanism analysis.

Question 2, please discuss how these emerging prediction dimensions are referenced through the data in the historical attachment 1, find

Go to similar sequences and complete the predicted values ​​of these dimensions from 2023-05-16 to 2023-05-30. Please fill in the prediction results in Result Table 2 and upload it to the competition platform.

Use the classification model established in Question 1 and introduce the data in Appendix 5 of Question 2 for reclassification and judgment. Try to use the same prediction model as Question 1 to make predictions.

Question 3: There are regular large-scale promotions in June every year , which brings great challenges to the accurate prediction of demand and fulfillment of contracts. Attachment 6 gives the demand data for the merchant + warehouse + product dimension corresponding to Attachment 1 during last year’s Double Eleven. Please refer to these data to give the forecast value from 2023-06-01 to 2023-06-20. Please fill in the prediction results in Result Table 3 and upload it to the competition platform.

Introducing data related to the merchant + warehouse + product dimensions under large-scale promotions is similar to the second question. Based on the introduced data, the classification model is used to obtain new classification results. For this new classification result, the same prediction model as in question 1 can be used.

Guess you like

Origin blog.csdn.net/qq_33690821/article/details/134087046