2023 Tianfu Cup Mathematical Modeling Competition Question A: Seismic source attribute identification model construction and magnitude prediction-detailed problem-solving ideas codes and answers

Question 1: Aiming at the seismic wave data in Annexes 1-8, find out a series of suitable indicators and criteria, build a seismic source attribute identification model, and carry out the identification of natural seismic events (Appendix 1-7) and non-natural seismic events (Appendix 8). Accurate distinction;

Idea: First of all, data preprocessing is required to sort out the waveform data of each group, and label the natural earthquake event 0 and the unnatural earthquake event 1 accordingly. Then corresponding to each set of waveform data, construct and extract features, such as kurtosis, skewness, amplitude, mean, standard deviation, minimum and maximum values, and then perform appropriate feature screening (such as chi-square test or correlation coefficient discrimination method), From this, a binary classification machine learning model can be established (support vector machine, logistic regression, random forest and other classification models can be used, and anomaly detection algorithms such as LOF/LOCI/ABOD algorithms can be used to better deal with the imbalance of positive and abnormal samples. Case).

#数据处理
import pandas as pd
import numpy as np
df_feature=pd.DataFrame([],columns=['事件','观测站','均值','振幅','标准差','最小值','最大值','峰度','偏度','是否天然'])
def get_features(path,event,station):#针对天然事件1~7
    fr = open(path, 'r')
    all_lines = fr.readlines()
    dataset = []
    for line in all_lines:
        line = line.strip().split('  ')
        dataset.append(line)
    # 转换成dataframe
    df = pd.DataFrame(dataset).T
    df[0] = pd.to_numeric(df[0], errors='coerce')
    range=df[0].max()-df[0].min()
    biaoqian=0  #附件1-7
    features_list=[event,station,df[0].mean(),range,df[0].std(),df[0].min(),
                   df[0].max(),df[0].kurt(),df[0].skew(),biaoqian]
    return features_list
def get_features_8(path,event,station):#针对非天然事件8
    fr = open(path, 'r')
    all_lines = fr.readlines()
    dataset = []
    for line in all_lines:
        line = line.strip().split('\n')
        dataset.append(line)
    # 转换成dataframe
    df = pd.DataFrame(dataset)
    df[0] = pd.to_numeric(df[0], errors='coerce')
    range=df[0].max()-df[0].min()
    biaoqian=1  #附件8
    features_list=[event,station,df[0].mean(),range,df[0].std(),df[0].min(),
                   df[0].max(),df[0].kurt(),df[0].skew(),biaoqian]
    return features_list
for i in range(1,8):
    for j in range(1,21):
        path = 'A/附件'+str(i)+'/'+str(j)+'.txt'
        df_feature.loc[len(df_feature),:]=get_features(path,i,j)
for i in range(8,9):
    for j in range(1,31):
        path = 'A/附件'+str(i)+'/'+str(j)+'.txt'
        df_feature.loc[len(df_feature),:]=get_features_8(path,i,j)
print(df_feature)
df_feature.to_csv('特征构建.csv',index=False)

Question 2: The amplitude and waveform characteristics of seismic waves are significantly related to the magnitude. According to the data in Annexes 1 to 7 with known magnitudes (the magnitudes are: 4.2, 5.0, 6.0, 6.4, 7.0, 7.4, and 8.0), select events and samples appropriately, establish a magnitude prediction model, and try to give Annex 9 The exact magnitude (to one decimal place) of a moderate earthquake event.

Idea : Here you can also select the features mentioned in the previous question and change the label to the corresponding magnitude. For each attachment, use the box plot to remove outliers, and then use the remaining samples and magnitude labels to build a prediction model. Here, methods such as decision tree and support vector machine regression can be used.

Boxplot

Decision Tree Prediction

Question 3: Reservoir depth, storage capacity, fault type, tectonic activity/basic intensity, lithology, etc. are important factors affecting the magnitude of reservoir-induced earthquakes. Based on the 102 reservoir seismic samples in Appendix 10, try to establish a relationship model between the basic attribute data of the reservoir and the magnitude, and give a reasonable basis.

Idea : First observe the given data and find that it contains many character variables, which need to be converted into numerical variables, which can be directly converted or one-hot encoded. First, you can use the correlation heat map to simply judge the correlation between each feature and the magnitude, then treat the magnitude as a label, and use the decision tree (more interpretable) method to construct the specific relationship between each feature and the magnitude.

forecast result

model evaluation

The full version of the problem-solving step code and answer is placed in the comment area. If it fails, please privately stamp it~

Guess you like

Origin blog.csdn.net/lichensun/article/details/131013443