(Source code version) 2023 Higher Education Society Cup National College Student Mathematical Modeling Competition - Question E Yellow River Water and Sediment Monitoring Question 1 Detailed Data Analysis + Python Code

I am so excited that the title is finally out! ! I entered the official website at 6 o'clock and got stuck. I only got the questions now. I planned to do all the AE questions. Let me briefly introduce myself: The blogger has been focusing on modeling for four years, and has participated in dozens of mathematical modeling, large and small, and understands the principles of various models, the modeling process of each model, and various problem analysis methods. He has participated in more than ten mathematical modeling competitions, won two M awards and one H award in three US competitions, and the second prize in the national competition. I hope that if you encounter modeling competitions in the future, you can like me and I can provide free ideas and some source codes. As long as I still have time for future digital modeling competitions, I will definitely write free and open source ideas as soon as possible. Bloggers keep up with various digital and analog competitions. For each digital and analog competition, bloggers will write the latest ideas and codes into this column, as well as detailed ideas and complete codes, and it is completely free. I hope that friends in need will not miss the article carefully crafted by the author.

only hope! ! ! I’ll be satisfied if you give me three in a row! ! So without further ado, let’s start doing the questions now

(updated source code version)

Competition question analysis

Question E is obviously a data analysis and mining question, involving time series prediction models and the processing of time series data. Analyze each question first.

Question 1

Question 1: Study the relationship between the sediment concentration of the Yellow River water at the hydrological station and time, water level, and water flow, and estimate the total annual water flow and annual sediment discharge of the hydrological station in the past 6 years.

Ideas

First of all, we understand that the object we need to study is the sediment content of the Yellow River water, and the dependent variables are time, water level, and water flow. We can basically solve this problem based on the 2016-2021 Yellow River water and sediment detection data in Appendix 1.

answer

Data and time dimensions It seems difficult to understand if you directly take the original data. Please process it a little:

First, let’s look at the relationship between water level and flow on sediment content:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pylab import mpl
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

plt.rcParams['font.sans-serif']=['Microsoft YaHei'] # 使用微软雅黑的字体

# 假设 df 是你的数据框
# 如果 df 是之前提供的数据,请将该数据转为数据框

# 绘制含沙量与水位的散点图
plt.scatter(df.iloc[:,0], df.iloc[:,2])
plt.xlabel('水位(m)')
plt.ylabel('含沙量(kg/m3)')
plt.title('含沙量与水位关系')
plt.show()

# 绘制含沙量与流量的散点图
plt.scatter(df.iloc[:,1],  df.iloc[:,2])
plt.xlabel('流量(m3/s)')
plt.ylabel('含沙量(kg/m3)')
plt.title('含沙量与流量关系')
plt.show()

 

Correlation matrix:

# 计算所有列的相关系数
correlation_matrix = df.corr()
# 打印相关系数矩阵
print(correlation_matrix)

 

import pandas as pd
import matplotlib.pyplot as plt

# 假设 df 是你的数据框,日期时间为索引
# 如果不是,请将日期时间列设置为索引:df.set_index('日期时间', inplace=True)
df.set_index('日期时间', inplace=True)

# 将索引转为日期类型,并提取年月日
df.index = pd.to_datetime(df.index).date
# 提取含沙量存在的行
df_with_sediment = df.dropna(subset=['含沙量(kg/m3) '])
# 使用移动平均法平滑数据
rolling_mean = df_with_sediment.iloc[:,2].rolling(window=30).mean()

# 绘制原始数据和移动平均线
plt.figure(figsize=(10, 6))
plt.plot(df_with_sediment.index, df_with_sediment.iloc[:,2], label='原始数据')
plt.plot(df_with_sediment.index, rolling_mean, label='30天移动平均线', color='red')
plt.xlabel('日期时间')
plt.ylabel('含沙量(kg/m3)')
plt.title('含沙量长期趋势')
plt.legend()
plt.show()

Both are positively correlated and the correlation is quite high, indicating that both have a positive impact on the sand content. Then use time series analysis method to identify the long-term trend of sand content:

 

 Record it here first and sort it out later! I just hope that everyone will give me three consecutive times to satisfy me . The next step is to calculate the estimated sand content in 6 years.

# 将日期时间列设置为索引
df = df.set_index('日期时间')

# 1. 按年份分组
df['年份'] = df.index.year

# 2. 计算年总水流量和排沙量
annual_flow = df.groupby('年份')['流量(m3/s)'].sum()
annual_sediment = df.groupby('年份')['含沙量(kg/m3) '].sum()

# 3. 汇总结果
total_flow = annual_flow.sum()
total_sediment = annual_sediment.sum()

print(f'近 6 年总水流量为: {total_flow} m3/s')
print(f'近 6 年总排沙量为: {total_sediment} kg/m3')

(Updated source code version)

The next update will be about the resume time series prediction model. If you are not very familiar with this model, I recommend reading my column:

Eight articles have been written about time series prediction models. Each time series prediction model has its own characteristics and optimal usage scenarios, but generally speaking, most time series data show seasonal changes (Season) and cyclic fluctuations (Cyclic). . For modeling based on these data, it is generally optimal to use seasonal time series forecasts. You can try reading this article of mine

 https://blog.csdn.net/master_hunter/category_10967944

Then we’ll see you in the next update 

Guess you like

Origin blog.csdn.net/master_hunter/article/details/132747579