2023 Huazhong Cup C-question full nanny tutorial and code air quality prediction

Huazhong Cup Question C: Air Quality Prediction

Question one:

According to Annex 1 and Annex 2, analyze and process the data, screen out the factors related to the change of PM2.5 concentration, and explain the influence degree of the screened factors on PM2.5 concentration.

This question is actually relatively simple, mainly for correlation impact analysis. Here I will talk about two methods. The first method is relatively simple. We can analyze it through conventional statistical models, such as correlation analysis, difference Analysis, etc. Here is an example of difference analysis. For example, based on PM2.5 concentration, we can use paired sample t-test to study whether there is a difference between him and other samples and its degree. The second approach is more tricky. We can first build a machine learning model, such as Xgboost. The model predicts the PM2.5 concentration, and other related factors are used as the input variables of the model. Excellent xgboost model, here we can use some heuristic algorithms, such as pso, genetic algorithm, etc., and then we calculate the contribution of each feature to the model through the shape model, and visualize the feature importance. In addition, we can also use the second method to study the feature importance of each variable, and then use the first method to study the differences between different variables.


Question two:

Divide the training set and test set by yourself, build a PM2.5 concentration multi-step prediction model based on Question 1 according to Annex 1 and Annex 2, and use the root mean square error (RMSE) to predict 3-step, 5-step, 7-step, and 12-step To evaluate the effect, please use the format of Table 1 to give the results in the main text, and visualize the test set and its prediction results. At the same time, use this model to predict the PM2.5 concentration at a given time in Annex 3. Please use the results The format of Table 2 is given in the text.

For the second question, we can use time series analysis. The multiple parts here actually refer to the steps of the time sliding forecast window. Generally speaking, there are two ways to do time series analysis, one is the traditional arima. model or Gray forecasting models, they are very orthodox single-sequence time forecasting. But in fact, the more popular one used in the industry is the regression model prediction. This method is actually to perform time window sliding processing on the data. Simply put, for example, if the step is set to 1, then the data of the first day is used for prediction. On the second day, the second day's data predicts the third day, and the third day's data predicts the fourth day. By analogy, we get x and y, and then use machine learning regression to train and predict, so the steps he mentioned here are three, which is actually to use 123 days of data to predict the fourth day, and use 234 days The data to predict the fifth day, like this model we can use the deep learning model LSTM or machine learning, such as the xgboost model


Question three:

Build an AQI multi-step prediction model, use the root mean square error (RMSE) to evaluate the modeling effect, and visualize the test set and its prediction results. At the same time, use this model to predict the AQI at the given time in Annex 3, and give the early warning level of air quality every day. Please use the format of Table 3 and Table 4 to give the results in the main text.

The third question is the same as the second question, there is nothing to say

The complete problem-solving video can be viewed at Station B

Full Nanny Tutorial 2023 Huazhong Cup Digital Model Competition C Problem Solution + Code + Data_哔哩哔哩_bilibili

Guess you like

Origin blog.csdn.net/weixin_44099072/article/details/130453451