2023 Huashu Cup Mathematical Modeling C Question Code + Model + Full Set of Papers

Table of contents

Complete code model paper View business card at the end of the paper

Quantization of non-numeric types

Data normalization code:

Question 1 Analysis:

Here is the code for PCA:

Relevant code for multiple linear regression:

See the full code model paper here


Complete code model paper View business card at the end of the paper

A mother is one of the most important people in a baby's life, providing not only nutrition and physical protection but also
emotional support and a sense of security. Adverse conditions of the mother's mental health, such as depression, anxiety, and stress, may have
negative effects on the baby's cognition, emotion, and social behavior. A stressed mother can
negatively affect a baby's physical and psychological development, for example by affecting sleep.
The appendix presents data on 390 infants aged 3 to 12 months and their mothers. These data cover a variety of
subjects, with physical indicators of mothers including age, marital status, education level, duration of pregnancy, mode of delivery, and maternal
psychological indicators CBTS (Posttraumatic Stress Disorder Questionnaire after Childbirth), EPDS (Edinburgh Postpartum Depression Scale) Table), HADS (Hospital
Anxiety and Depression Scale), and infant sleep quality indicators including duration of sleep throughout the night, number of wake-ups, and sleep-onset patterns.
Background analysis: It can be seen from the background that this question mainly addresses the impact of the mother’s physical and mental health on the growth of the baby. We need to
analyze the data attached to the question to determine the result of the question. This means that you cannot use the data you find yourself except the attachments
. For a better analysis of the subject. Let's analyze the attachment data first.
The figure shows the attached indicators:
We can see that the baby's behavior is non-numeric data, and we need to process it as numerical data first. For
quantification of non-numeric data, you can use the following methods:

Quantization of non-numeric types


1 Label encoding Label encoding is a method
of quantizing non-numeric data by converting a set of possible values ​​into integers .
For example, in the field of machine learning, for a variable with multiple categories, we can
assign a unique integer value to each category, so that it can be converted into numeric data.
2 One-hot encoding Onehot
one-hot encoding is a method of converting multiple possible values ​​into a binary array. In one-hot encoding,
each possible value corresponds to a binary array whose length is the total number of possible values, in which only one element
is 1, and the rest of the elements are all 0. For example, for a gender variable, one-hot encoding can be used to convert "male"
and "female" to [1, 0] and [0, 1] respectively.
3 Categorical counting
Categorical counting is a simple method of converting non-numeric data into numerical data. In categorical counting
, we classify data according to some specific attribute (such as education, occupation, etc.), and then count
the number or frequency of each category. For example, in a survey questionnaire, we can
classify the responses to a question into three categories of "yes", "no" and "not sure" and count the number or frequency of each category.
4 Principal Component Analysis
Principal Component Analysis is a method to transform multidimensional data into a low-dimensional representation. In principal component analysis,
we perform dimensionality reduction on raw data by finding the principal components that best explain the variation in the data. In this way,
non-numeric data can be converted to numeric data.
It is recommended to use label encoding/one-hot encoding, and it is better to use label encoding, because the data of this indicator has a size relationship
.
In addition, for the sleep time of the whole night, it needs to be converted into numerical data, that is, for example, 10:30 needs to be converted
into 10.5, which is convenient for subsequent calculation and analysis.
Another very important step in data preprocessing is: normalization processing to avoid the impact of dimensions on modeling results, as follows
:

Data normalization code:
 

#归一化处理
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
# 读取数据
data = pd.read_excel('your_dataset.xlsx')
# 提取需要进行归一化的指标列
features_to_normalize = ['age', 'education', 'pregnancy_time', 'CBTS', 'EPDS', 'HADS', 'sleep_duration',
'awakening_times']
# 使用 MinMaxScaler 进行归一化
scaler = MinMaxScaler()
data[features_to_normalize] = scaler.fit_transform(data[features_to_normalize])
# 输出归一化后的数据
print(data)

1. Many studies have shown that the mother's physical and psychological indicators have an impact on the baby's behavioral characteristics and sleep quality. I would like to ask
whether there is such a rule and conduct research based on the data in the attachment.


Question 1 Analysis:

The question asks whether the mother’s physical and psychological indicators have an
impact on the baby’s behavioral characteristics and sleep quality. It should be noted here that the question does not clearly explain what these indicators are, so we need to
match the indicators attached to the question with the question. To establish the mother's physical index model, psychological index model, baby's behavioral characteristic model and sleep quality
model, here you need to consider which indexes are directly related. Here I will explain
how we need to build these models after the determination. The usual methods are:
l Principal Component Analysis (PCA): PCA is a linear dimensionality reduction method, by
finding most important principal components in the data set components to achieve dimensionality reduction. It projects the original data into a new orthogonal coordinate system that
maximizes the variance over the new coordinate system, thereby preserving the most informative features in the data.
l Linear Discriminant Analysis (LDA): LDA is also a linear dimensionality reduction method, but
unlike PCA, LDA is a supervised learning method, mainly used for classification tasks. While reducing dimensionality, it tries to
maximize the inter-class distance and minimize the intra-class distance to obtain a low-dimensional representation with more discriminative performance.
l Locally Linear Embedding (LLE): LLE is a nonlinear dimensionality reduction method that
constructs a low-dimensional representation by maintaining the linear relationship between samples in a local area.
LLE assumes that data is locally linearly separable in high-dimensional space , and maps high-dimensional data to low-
dimensional space by reconstructing the linear relationship between each sample and its neighbors.
l Non-negative Matrix Factorization (NMF): NMF is a method for non-negative data
dimensionality reduction method. It decomposes the original data matrix into the product of two non-negative matrices, resulting in a latent feature representation
. NMF is commonly used in areas such as image processing and text mining.
l t-SNE: t-SNE is a popular nonlinear dimensionality reduction method for visualizing high-dimensional data. It maps high-dimensional data into two-dimensional or three-dimensional space by maintaining
the local similarity between samples. t-SNE can well reveal
the class structure and clustering patterns in the data.


Here is the code for PCA:
 

#PCA代码
from sklearn.decomposition import PCA
import numpy as np
# 假设我们有一个数据集 X,其中每一行表示一个样本,每一列表示一个特征
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# 创建 PCA 对象,并指定要保留的主成分数量
pca = PCA(n_components=2)
# 对数据进行降维
X_pca = pca.fit_transform(X)
# 输出降维后的结果
print(X_pca)

2. The Infant Behavior Questionnaire is a scale used to assess behavioral characteristics of infants. It contains
questions about the infant's emotions and reactions. We divide the behavioral characteristics of infants into three types: quiet, moderate, and ambivalent. Please establish
a relationship model between the baby's behavioral characteristics and the mother's physical and psychological indicators. In the last 20 groups (No.
391-410) in the data table, the behavior characteristic information of the babies has been deleted, please judge what type they belong to.
Analysis of Question 2: Question 2 obviously needs to establish a predictive model, which requires us to define independent variables and dependent variables, and then
use appropriate model algorithms to establish the relationship between the two. The recommended methods are: multiple linear regression, random forest, neural
network, nonlinear fitting and other methods. Here we must use what I mentioned earlier to quantify the behavioral characteristics of babies
. (It is recommended to set it to 1, 2, 3, the reason was mentioned before) After that, it will be easy to handle. These methods are good prediction
methods I think the topic is relatively simple. You don’t need to use complex algorithms, just multiple linear regression solve. as follows


Relevant code for multiple linear regression:
 

#多元线性回归
import statsmodels.api as sm
import pandas as pd
# 假设我们有一个包含多个特征和目标变量的数据集 data,其中每一行表示一个样本
data = pd.DataFrame({'x1': [1, 2, 3, 4, 5],
'x2': [6, 7, 8, 9, 10],
'y': [11, 12, 13, 14, 15]})
# 提取自变量(特征)和因变量(目标变量)
X = data[['x1', 'x2']]
y = data['y']
# 添加常数列作为截距项,如果你不需要截距项,可以省略此步骤
X = sm.add_constant(X)
# 创建多元线性回归模型并拟合数据
model = sm.OLS(y, X)
results = model.fit()# 输出回归模型的详细结果
print(results.summary())

#拟合代码
import numpy as np
import matplotlib.pyplot as plt

# Creating sample data
x = [0.05 ,0.1 ,0.5 ,1 ,2 ,3 ,4 ,5]
y = [0.636966 ,0.704712 ,0.905694 ,1.451922 ,2.204378 ,2.956834 ,3.409290 ,4.161746]

# Fitting polynomial of degree 2
p = np.polyfit(x, y, 1)
p1 = np.poly1d(p,variable='x')

print("y = ",p1)

# Plotting the data points and regression line
plt.scatter(x, y)
plt.plot(x, np.polyval(p, x))

r1 = np.corrcoef(y, p1(x))[0, 1]

print("拟合优度(一次):", r1)
plt.show()

See the full code model paper here

Guess you like

Origin blog.csdn.net/weixin_45499067/article/details/132096349