2023 Mathematical Modeling National Competition E Question Yellow River Water and Sediment Monitoring Data Analysis Complete Code Analysis + Processing Results + Idea Documentation

Complete code analysis + processing results + idea analysis (30+ pages) for Question E of the National Competition on Yellow River Water and Sand Monitoring Data Analysis, including data preprocessing and data visualization (visualization of grouped data distribution diagrams, visualization of correlation coefficient heat maps, and scatter plots) Visualization), regression models (decision tree regression model, random forest regression, GBDT regression, support vector machine regression, fully connected neural network), and will be continuously updated in the future.

See the end of the article for the download address of the complete code + result + idea document

Question E Analysis of water and sediment monitoring data in the Yellow River.... 2

Question 1 Analysis and Research.... 3

Objective 1: Relationship between sediment concentration and time, water level, water flow... 3

Objective 2: And estimate the total annual water flow and annual sediment discharge of the hydrological station in the past 6 years.... 3

The first step: combine the goal of question 1, and process the data.... 3

Step 2: Data visualization analysis to see the relationship between data.... 4

Group data distribution chart visualization.... 6

Correlation coefficient heat map visualization.... 12

Scatter plot visualization.... 14

Goal 1: Solution, establish a regression model, analyze the relationship between them, and predict the sand content.... 23

Model 1: Decision tree regression model.... 24

Model 2: Random Forest Regression....27

Model 3: GBDT regression....29

Model 4: Support vector machine regression.... 29

Model 5: Fully connected neural network.... 30

Question 2: Solution, (estimate the total annual water flow and total annual sediment discharge of the hydrological station in the past 6 years) 31

The mathematical modeling topic of Question E in the National Competition is as follows: The Yellow River is the mother river of the Chinese nation. Studying the changing laws of water and sediment flux in the Yellow River has implications for environmental governance, climate change and people's lives along the Yellow River Basin, as well as for optimizing water resources distribution in the Yellow River Basin, coordinating the relationship between man and land, regulating water and sediment, and preventing floods and disasters. important theoretical guidance.

Appendix 1 gives the actual monitoring data of water level, water flow and sediment content of a hydrological station on the Yellow River downstream of Xiaolangdi Reservoir in the past six years. Appendix 2 gives the measurement data of the Yellow River section of the hydrological station in the past six years. Appendix 3 gives Relevant data of some monitoring points of the hydrological station were released. Please establish a mathematical model to study the following issues:

Question 1: Study the relationship between the sediment content of the Yellow River water at the hydrological station and time, water level, and water flow, and estimate the total annual water flow and total annual sediment discharge at the hydrological station in the past 6 years.

Question 2: Analyze the characteristics of mutation, seasonality and periodicity of water and sediment flux at this hydrological station in the past six years, and study the changing rules of water and sediment flux.

Question 3: Based on the change pattern of water and sediment flux of the hydrological station, predict and analyze the changing trend of water and sediment flux of the hydrological station in the next two years, and formulate the optimal sampling monitoring plan for the hydrological station in the next two years (number of sampling monitoring and Specific time, etc.), so that it can not only grasp the dynamic changes of water and sediment flux in time, but also minimize monitoring cost resources.

Question 4: Based on the changes in water and sediment flux and river bottom elevation at the hydrological station, analyze the actual effect of the "water and sediment regulation" carried out by Xiaolangdi Reservoir in June and July every year. If "water and sediment regulation" is not carried out, what will happen to the river bottom elevation of the hydrological station in 10 years?

problem analysis

Question 1: Study the relationship between the sediment content of the Yellow River water at the hydrological station and time, water level, and water flow, and estimate the total annual water flow and total annual sediment discharge at the hydrological station in the past 6 years. (For complete documentation and code, see the address at the end of the article)

First import the relevant libraries:

## 设置图像显示情况
%config InlineBackend.figure_format = "retina"
%matplotlib inline    
import seaborn as sns  ## 设置中文字体显示
sns.set(font= "SimSun",style="whitegrid",font_scale=1.4)
import matplotlib  ## 解决坐标轴的负号显示问题
matplotlib.rcParams['axes.unicode_minus']=False 
## 导入需要的库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import missingno as msno 
from scipy import stats
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import plotly.express as px

## 忽略提醒
import warnings
warnings.filterwarnings("ignore")

Question 1 Analysis and Research

The analysis of question 1 of the 2023 National Competition Mathematical Modeling E is as follows

The first is goal 1: the relationship between sediment content and time, water level, and water flow.

Sub-questions: the relationship between sand content and time, the relationship between sand content and water level, the relationship between sand content and water flow (note that the relationship modeling between the two can be analyzed separately, or one or more variables can be analyzed modeling of relationships)

The analysis methods and steps can be: (1) Data cleaning and sorting to obtain the data of interest, using visualization to assist in analyzing the relationships between them, and using models such as correlation analysis and regression analysis to establish quantitative relationships between the data. (For complete documentation and code, see the address at the end of the article)

Next is goal 2: and estimate the total annual water flow and total annual sediment discharge of the hydrological station in the past 6 years.

Sub-question: The total sediment discharge can theoretically be calculated from water flow and sediment content. Therefore, the focus is still on analyzing the relationship between annual total water flow and sediment content.

The analysis methods and steps can be: (1) Data cleaning and sorting to obtain the data of interest, using visualization to assist in analyzing the relationship, and obtaining the target data through corresponding calculations.

Step 1: Combined with the goal of question 1, perform data processing operations

Combined with the data characteristics given in Appendix 1, we will provide the data measurement accuracy to the accuracy of days.

Step 2: Data visualization analysis to see the relationship between data

## 根据时间变量变化的数据散点图可视化

## 水位的变化情况
plt.figure(figsize=(12,3))
p = sns.lineplot(data=dfq1, x="日期", y="水位",lw = 2)
plt.xlabel("时间")
plt.ylabel("水位(m)")
plt.title("")
plt.savefig('figs/水位的变化情况.png', dpi=300, bbox_inches='tight')
plt.show()

## 流量的变化情况
plt.figure(figsize=(12,3))
p = sns.lineplot(data=dfq1, x="日期", y="流量",lw = 2)
plt.xlabel("时间")
plt.ylabel("流量("+"$m^3$"+"/s)")
plt.title("")
plt.savefig('figs/流量的变化情况.png', dpi=300, bbox_inches='tight')
plt.show()

## 含沙量的变化情况
plt.figure(figsize=(12,3))
p = sns.lineplot(data=dfq1, x="日期", y="含沙量",lw = 2)
plt.xlabel("时间")
plt.ylabel("含沙量(kg/"+"$m^3$"+")")
plt.title("")
plt.savefig('figs/含沙量的变化情况.png',dpi=300,bbox_inches='tight')
plt.show()


## 可以发现在含沙量等特征的变化情况

Group data distribution chart visualization

For the sand content data, further analyze its changing trends over time and years.

sns.swarmplot(data=dfq1, x="年", y="含沙量", hue="年")
plt.xlabel("年")
plt.ylabel("含沙量(kg/"+"$m^3$"+")")
plt.title("")
plt.savefig('figs/含沙量数据随时间年份上的变化趋势.png', dpi=300, bbox_inches='tight')
plt.show()

## 可以发现2018-2021年,含沙量普遍偏高

## 针对含沙量数据,进一步的分析其随时间月份上的变化趋势
plt.figure(figsize=(12,6))
sns.swarmplot(data=dfq1, x="月", y="含沙量", hue="月")
plt.xlabel("月")
plt.ylabel("含沙量(kg/"+"$m^3$"+")")
plt.title("")
plt.savefig('figs/含沙量数据随时间月份上的变化趋势.png', dpi=300, bbox_inches='tight')
plt.show()

From the visual image, it can be found that the sand content is obviously affected by the two variables of year and month, that is, it is affected by time (see the address at the end of the article for the complete code)

Correlation coefficient heat map visualization

(完整代码见文末地址)
Index(['年', '月', '日', '水位', '流量', '含沙量', '日期'], dtype='object')

## 可以计算几个特征之间的相关系数,从而展示相关性的大小

## 也可以特征之间的秩相关系数
corrdf = dfq1[["年","月","日","水位","流量","含沙量"]]
corrdfval = corrdf.corr(method = "pearson")
print(corrdfval)
## 可视化相关系数热力图
plt.figure(figsize=(10,8))
ax = sns.heatmap(corrdfval,square=True,annot=True,fmt = ".2f",
                 linewidths=.5,cmap="YlGnBu",
                 cbar_kws={"fraction":0.046, "pad":0.03})
ax.set_title("相关性(pearson)")
plt.savefig('figs/相关系数热力图.png', dpi=300, bbox_inches='tight')
plt.show()

It can be found that the sediment concentration has nothing to do with the day, the month, year and month are weakly correlated, and the correlation with the water level and flow is strong (the linear relationship analyzed here)

Scatter plot visualization

2023 Mathematical Modeling National Competition Question E: Visualize the scatter diagram between water level and sediment concentration

(完整代码见文末地址)
plt.figure(figsize=(12,6))
sns.scatterplot(data=dfq1,x="水位", y="含沙量",
                palette="Set1",s = 60)
plt.xlabel("水位(m)")
plt.ylabel("含沙量(kg/"+"$m^3$"+")")
plt.title("")
plt.savefig('figs/水位与含沙量之间的散点图1.png', dpi=300, bbox_inches='tight')
plt.show()

## 可视化 水位月含沙量之间的散点图
# plt.figure(figsize=(12,6))
sns.lmplot(data=dfq1,x="水位", y="含沙量", 
                palette="Set1",height=6,aspect=1.5)
plt.xlabel("水位(m)")
plt.ylabel("含沙量(kg/"+"$m^3$"+")")
plt.title("")
plt.savefig('figs/水位与含沙量之间的散点图2.png', dpi=300, bbox_inches='tight')
plt.show()

plt.figure(figsize=(12,6))
sns.scatterplot(data=dfq1,x="水位", y="含沙量", hue="年",
                palette="Set1",s = 60)
plt.xlabel("水位(m)")
plt.ylabel("含沙量(kg/"+"$m^3$"+")")
plt.title("")
plt.savefig('figs/水位与含沙量之间的散点图3.png', dpi=300, bbox_inches='tight')
plt.show()

The relationship between sediment concentration and flow may not be a simple linear relationship, but is also affected by other characteristics. And it is very similar to the data distribution between the front and the water level (maybe using one of them can express the sand concentration well)

Goal 1: Solution, build a regression model, analyze the relationship between them, and predict the sand concentration

from sklearn.ensemble import RandomForestRegressor,GradientBoostingRegressor
from sklearn.svm import SVR,LinearSVR
from sklearn.tree import *
from sklearn.metrics import *
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import  train_test_split
from sklearn.preprocessing import StandardScaler
import graphviz
import pydotplus
from IPython.display import Image  
from io import StringIO

Model 1: Decision tree regression model

Build a decision tree regression model to predict the data, using default parameters

From the prediction effect of the model on the dependent variable, we can know that the model predicts the trend of the data well.

Analyze the prediction accuracy on the training set and test machine at different depths

In addition to model 1: decision tree regression model, there are also model 2: random forest regression, model 3: GBDT regression, model 4: support vector machine regression, and model 5: fully connected neural network.

Complete code + result + idea document download: 2023 Mathematical Modeling National Competition E question complete code and document

Guess you like

Origin blog.csdn.net/qq_45857113/article/details/132760017