Mathematical Modeling-2022 Asia Pacific Competition Question C (including idea process and code)

Table of contents

1. Title and general idea

2. Data preprocessing

 3. Prediction model

4. Correlation analysis of global warming

5. Summary after the game


1. Title and general idea

       First, the data is dimensionally processed, and the relative uncertainty is calculated based on the given uncertainty and data, and the abnormal points are eliminated. Normally, the relative uncertainty does not exceed 5%, and it is effective for the remaining The data is valued every 10 years in different regions. The average and standard values ​​can be used instead, or the different average temperatures in the four seasons can be further considered for further processing. Combined with the year-on-year growth rate, we can judge the temperature increase trend every ten years and determine whether the increase caused by March 2022 is greater than the previous ten years.

(This place seems to have different understandings. Is the increase in March 2022 only compared with 2012-2022, or compared with any previous decade. The "any" in the question deviates from the solution method)

       Next is the temperature prediction model, which can use time series analysis, regression prediction, gray system prediction, deep learning, etc.

For the method, you can refer to my previous article:Mathematical modeling--prediction model_Mr. Patrick Star c's blog-CSDN blog_Model prediction

       Compare the accuracy of the model to draw the conclusion of the first question.

      The second questionEstablish a mathematical model to analyze the relationship between global temperature, time and location For the relationship, the first thing that can be used is factor analysis. By searching for data and using the principal component analysis method, considering that greenhouse gas emissions are the main cause of global warming, statistics of changes in these data over time, according to the general principles of principal component analysis Step 1: Perform dimensionality reduction, select appropriate principal components based on contribution rate to establish a multiple linear regression model, and evaluate the goodness of fit through existing data.

2. Data preprocessing

Use excel to process the given data set into a table with monthly average temperature and annual average temperature to facilitate subsequent calculations.

Through the filter function of Excel, you can get the temperature in March every 10 years

 Analyze the graphs, and use spsspro to conduct an independent sample T test to find significant differences, thereby obtaining the question results.

 3. Prediction model

For the prediction model, we selected the time series model and LSTM neural network for prediction.

Select the year and annual average temperature

 Time series analysis with SPSSPRO

 The R-square is about 0.8, and the fitting degree is relatively high, which basically meets the requirements of the question, and then the annual average temperature in 2050 and 2100 is predicted backward.

Then use lstm code to implement prediction

import numpy
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.models import Sequential, load_model

# 将整型变为float
dataset = dataset.astype('float32')
def mean_absolute_percentage_error(y_true, y_pred): 

    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100 
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
test_size = -8
trainlist = dataset[:test_size]
testlist = dataset[test_size:]
 
def create_dataset(dataset, look_back):
#这里的look_back与timestep相同
    dataX, dataY = [], []
    for i in range(len(dataset)-look_back):
        a = dataset[i:(i+look_back)]
        dataX.append(a)
        dataY.append(dataset[i + look_back])
    return numpy.array(dataX),numpy.array(dataY)


#训练数据太少 look_back并不能过大
look_back = 1
trainX,trainY  = create_dataset(trainlist,look_back)
testX,testY = create_dataset(testlist,look_back)
trainX = numpy.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))
testX = numpy.reshape(testX, (testX.shape[0], testX.shape[1] ,1 ))
# create and fit the LSTM network
model = Sequential()
model.add(LSTM(4, input_shape=(None,1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)
# model.save(os.path.join("DATA","Test" + ".h5"))
# make predictions
 

#%%

#模型验证
#model = load_model(os.path.join("DATA","Test" + ".h5"))
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
 
#反归一化
trainPredict_ = scaler.inverse_transform(trainPredict)
trainY_ = scaler.inverse_transform(trainY)
testPredict_ = scaler.inverse_transform(testPredict)
testY_ = scaler.inverse_transform(testY)

#%%

from sklearn.metrics import mean_squared_error,mean_absolute_error

def score(y_true, y_pre):
    # MAPE
    print("MAPE :")
    print(mean_absolute_percentage_error(y_true, y_pre)) 
    # RMSE
    print("RMSE :")
    print(np.sqrt(mean_squared_error(y_true, y_pre))) 
    # MAE
    print("MAE :")
    print(mean_absolute_error(y_true, y_pre)) 
    # # R2
    # print("R2 :")
    # print(np.abs(r2_score(y_true,y_pre)))

    

#%%

score(trainPredict_,trainY_)

#%%

score(testPredict_,testY_)

#%%

df['Year'].values.astype('float32')[:test_size].shape

#%%

plt.plot(df['Year'].values.astype('float32')[:test_size-1],trainY_, label='observed data')
plt.plot(df['Year'].values.astype('float32')[:test_size-1],trainPredict_, label='LSTM')
plt.xlabel( '年份')
plt.ylabel( '平均温度')
plt.title( '训练集平均温度情况')
plt.savefig('./Q1/训练集平均温度情况.jpg') 
plt.show()

#%%

plt.plot(df['Year'].values.astype('float32')[test_size+1:],testY_, label='observed data')
plt.plot(df['Year'].values.astype('float32')[test_size+1:],testPredict_, label='LSTM')
plt.xlabel( '年份')
plt.ylabel( '平均温度')
plt.title( '测试集平均温度情况')
plt.savefig('./Q1/测试集平均温度情况.jpg') 
plt.show()

Get the fitted image

 

Comparing the mape, rmse, and mae of the model, it is concluded that the ARIMA model (time series model) is better.

(It may be that there is a problem with our data processing here, resulting in a small amount of data. When using deep learning methods, it is not even as good as conventional prediction methods such as linear regression)

4. Correlation analysis of global warming

We divide the data according to countries, continents, northern and southern hemispheres, etc. according to most online ideas.

Conducted separatelyKendall consistency test

 

 No matter which distinction is made, it fully proves the correlation between global warming and regional time.

The fiber bundle model used later analyzes which factor in natural disasters affects global warming the most. (If the data is suitable, principal component analysis can also be used here, which is considered as a simple evaluation model)

5. Summary after the game

For question C in this APCMC Asia-Pacific Competition, the overall difficulty is not very high. A lot of ideas emerged in less than a morning after getting the question, and there were no big difficulties in the process of solving the question. If the data processing is relatively good, the answer to the question can be easily solved. However, some problems occurred when using the excel and lstm models. Although it does not affect the final completion, it still gives us a warning. More is still You need to refer to some more mature papers and polish the words to get a good score.

The follow-up work will focus on strengthening the ability of data processing and paper writing and formatting.

Guess you like

Origin blog.csdn.net/m0_58585940/article/details/128133179