2023 The 9th Dimensional Cup Question B | Pyrolysis Catalytic Reaction Modeling Analysis, Senior Xiaolu leads the team to guide the full code articles and ideas

I am senior Xiaolu, studying at Shanghai Jiao Tong University. So far, I have helped more than 200 people complete modeling and idea building~

This time I will take you to experience question B of the 9th Digital Dimension Cup!
Insert image description here

Problem restatement

Question 1: Through mathematical modeling, analyze the impact of catalyst (desulfurization ash) on the yield of products (tar, water, coke residue, syngas) during the pyrolysis process of cotton stalk (cotton stalk) and model compounds (CE and LG) , paying special attention to the role of desulfurization ash in catalytic pyrolysis under different mixing ratios.

Question 2: Use experimental data to conduct an in-depth study on the mixing ratio of three pyrolysis combinations (desulfurization ash-cotton straw, desulfurization ash-CE, desulfurization ash-LG) on the output of pyrolysis gas products (H2, CO, CO2, CH4, etc.) The impact is explained through graphical results.

Question 3: Under the same catalytic ratio of desulfurization ash, explore the similarities and differences in the pyrolysis product yields and pyrolysis gas components of cellulose (CE) and lignin (LG), and provide detailed explanations.

Question 4: Establish a mechanism model of the catalytic reaction of desulfurization ash on model compounds (CE and LG), conduct reaction kinetic analysis, and verify the significance of the model parameters through least squares method and statistical testing.

Question 5: Use machine learning methods (support vector regression, etc.) to build a model based on given data to predict the impact of catalysts on product yields under different conditions to achieve order-of-magnitude predictions of pyrolysis products.

Modeling ideas

Question one

The modeling idea for question one uses multiple linear regression, in which the mixing ratio is the independent variable and the yield of the product is the dependent variable. Here are the specific steps:

  1. data preparation:

    • Organize the experimental data into a format suitable for multiple linear regression, ensuring that the mixing ratios and corresponding product yields are included in the data set.
  2. Variable selection:

    • Select the mix ratio as the independent variable, i.e. (X). Product yield serves as the dependent variable, i.e. (Y).
  3. Multiple linear regression model:

    • Build a multiple linear regression model: Y = β 0 + β 1 X 1 + β 2 X 2 + … + β n X n + ε Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \ldots + \beta_n X_n + \varepsilon AND=b0+b1X1+b2X2++bnXn+e
    • 这り,Y Y Y是产物产量, X 1 , X 2 , … , X n X_1, X_2, \ldots, X_n X1,X2,,Xn is the different components in the mixing ratio, β 0 , β 1 , … , β n \beta_0, \beta_1, \ldots, \beta_n b0,b1,,bn is the regression coefficient, ε \varepsilon ε This is the difference.
  4. Model training:

    • Split the data set into training and test sets.
    • Use the training set to train a multiple linear regression model.
  5. Model evaluation:

    • Use the test set data to evaluate the model and examine the model's predictive performance.
    • Metrics such as mean square error (MSE) can be used to evaluate the fit of the model.
  6. Significance test of regression coefficient:

    • Conduct a significance test on the regression coefficient to determine whether the mixing ratio has a significant impact on product yield.
    • Usually, a p-value less than a set significance level (such as 0.05) indicates that the correlation coefficient is significant.
  7. Model explanation:

    • Explain the physical meaning of the regression coefficient and understand the specific impact of the mixing ratio on product yield. Factors such as interactions may need to be considered.
  8. Model application:

    • Use the trained model to predict new mixing ratios to estimate product yield.
import pandas as pd
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# 假设你的数据文件为 'data.xlsx',包含混合比例和产物产量
data = pd.read_excel('data.xlsx')

# 选择自变量和因变量
X = data[['混合比例1', '混合比例2', '其他混合比例', ...]]  # 根据你的实际情况替换列名
y = data['产物产量']

# 添加截距项
X = sm.add_constant(X)

# 分割数据集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 构建多元线性回归模型
model = sm.OLS(y_train, X_train).fit()

# 模型预测
y_pred = model.predict(X_test)
#见完整代码

Question 2

The modeling idea for question two uses principal component analysis (PCA) to convert multiple pyrolysis gas products into several principal components through dimensionality reduction, and perform regression analysis on these principal components.

Principal Component Analysis (PCA) is a commonly used data dimensionality reduction technique used to discover the main structures in data and reduce the dimensions of the data. The main goal of PCA is to map the original data to a new coordinate system through linear transformation so that the variance of the mapped data in the new coordinate system is maximized.

Specifically, the steps of PCA are as follows:

  1. Data normalization:

    • The original data is standardized so that each feature has a mean of 0 and a variance of 1.
  2. Construct the covariance matrix:

    • Calculate the covariance matrix of the normalized data. The covariance matrix reflects the relationship between different features.
  3. Compute eigenvalues ​​and eigenvectors:

    • Perform eigenvalue decomposition on the covariance matrix to obtain eigenvalues ​​and corresponding eigenvectors. The eigenvector represents the direction of the new coordinate system, and the eigenvalue represents the variance of the data in this direction.
  4. Select principal components:

    • Sort the eigenvalues ​​by size, and select the eigenvectors corresponding to the first k eigenvalues ​​as the principal components, where k is the dimension you want to reduce.
  5. Construct the projection matrix:

    • Construct the projection matrix by columns from the selected first k eigenvectors.
  6. Data projection:

    • Project the original data onto the subspace composed of the selected first k feature vectors to obtain the dimensionally reduced data.

Applications of PCA include but are not limited to:

  • Data dimensionality reduction: Reduce the dimensions of data and remove redundant information.
  • Data visualization: Map high-dimensional data to two-dimensional or three-dimensional space to facilitate visualization.
  • Noise filtering: removing noise from data.
  • Feature extraction: Extract the most important features from high-dimensional data.

Combined with problem 2, the detailed steps to solve using the PCA method are as follows:

  1. data preparation:

    • Organize the experimental data into a format suitable for principal component analysis to ensure that the data set contains information on pyrolysis gas products.
  2. Principal component analysis (PCA):

    • Principal component analysis was used to reduce the dimensionality of pyrolysis gas products, converting multiple related products into several principal components to capture the main changes.
    • Choose the number of principal components, usually those that explain most of the variance.
  3. Regression model:

    • Regression analysis was performed using principal components as new independent variables. The dependent variable here is the mix ratio.
  4. Model training:

    • Split the data set into training and test sets.
    • Use the training set to train the regression model.
  5. Model evaluation:

    • Use the test set data to evaluate the model and examine the model's predictive performance.
    • Metrics such as mean square error (MSE) can be used to evaluate the fit of the model.
  6. Explanation of principal components:

    • Explain the physical meaning of the principal components and understand their contribution to the original pyrolysis gas products.
  7. Model explanation:

    • Explain the physical meaning of the regression coefficient and understand the influence of the principal components on the mixing ratio.
  8. Model application:

    • Use the trained model to predict new mixing ratios to estimate the changing trend of pyrolysis gas products.
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# 读取数据
data = pd.read_excel('你的数据文件.xlsx')

# 选择自变量(混合比例)和因变量
X = data[['混合比例1', '混合比例2', '其他混合比例', ...]]  # 根据你的实际情况替换列名
y = data['热解产物产量']

# 数据标准化(重要:PCA对数据的尺度敏感,需要先进行标准化)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# PCA降维
pca = PCA(n_components=2)  # 选择主成分数量
X_pca = pca.fit_transform(X_scaled)

# 构建新的数据框,包含主成分和因变量
df_pca = pd.DataFrame(X_pca, columns=['PC1', 'PC2'])
df_pca['热解产物产量'] = y

# 分割数据集
X_train, X_test, y_train, y_test = train_test_split(df_pca[['PC1', 'PC2']], y, test_size=0.2, random_state=42)

# 线性回归模型
model = LinearRegression()
model.fit(X_train, y_train)

Question three

  1. data preparation:

    • The experimental data were compiled and organized according to time series, including information on the pyrolysis product yields of cellulose (CE) and lignin (LG) and pyrolysis gas components. Make sure timestamps are included in the dataset so you can build time series models.
  2. Data normalization:

    • The data is normalized to ensure that the data at each time point are on the same scale.
  3. Build RNN model:

    • Build a recurrent neural network model using a deep learning framework such as TensorFlow or PyTorch. To choose an appropriate network structure, you can consider using variants such as LSTM (Long Short-Term Memory Network) or GRU (Gated Recurrent Unit).
    • The input sequence is the pyrolysis product yield of cellulose and lignin and the information of pyrolysis gas components, and the output is the corresponding product yield.
  4. Model training:

    • Divide the data set into training set and test set.
    • Use the training set to train the RNN model and adjust the model parameters.
  5. Model evaluation:

    • Use the test set to evaluate the model and examine its predictive performance.
    • Metrics such as mean square error (MSE) can be used to evaluate the fit of the model.
  6. Interpretation of results:

    • Analyze the model output to understand the differences in pyrolysis product yields of cellulose and lignin and pyrolysis gas components under the same catalytic ratio of desulfurization ash.
  7. Visualization:

    • Use visualization tools to display model predictions for cellulose and lignin pyrolysis processes to more visually demonstrate the differences.
  8. Tweaks and optimizations:

    • Depending on the results of the model evaluation, it may be necessary to adjust the model structure, hyperparameters, or use other technical means to optimize the performance of the model.
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from sklearn.metrics import mean_squared_error

# 读取数据
data = pd.read_excel('你的数据文件.xlsx')

# 选择特征和目标
features = data[['时间戳', 'CE产物产量', 'LG产物产量', 'CE热解气体组分', 'LG热解气体组分']]
target = data[['CE产物产量', 'LG产物产量', 'CE热解气体组分', 'LG热解气体组分']]

# 数据标准化
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)
target_scaled = scaler.fit_transform(target)

# 构建时间序列数据
time_steps = 10  # 设置时间步长,根据实际情况调整
X, y = [], []
for i in range(len(features_scaled) - time_steps):
    X.append(features_scaled[i:i + time_steps, :])
    y.append(target_scaled[i + time_steps, :])

X, y = np.array(X), np.array(y)

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 构建LSTM模型
model = Sequential()
model.add(LSTM(units=50, activation='relu', input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dense(units=target.shape[1]))  # 输出层的神经元数量等于目标的特征数量

# 编译模型
model.compile(optimizer='adam', loss='mse')

# 训练模型
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.1)

# 模型预测
y_pred = model.predict(X_test)

# 反向转换标准化的预测结果
y_pred_rescaled = scaler.inverse_transform(y_pred)
y_test_rescaled = scaler.inverse_transform(y_test)

# 模型评估
mse = mean_squared_error(y_test_rescaled, y_pred_rescaled)
print(f'Mean Squared Error: {
      
      mse}')

Question 4

For question four, use a simplified apparent kinetics model (Apparent Kinetics Model) to describe the catalytic reaction of desulfurization ash on model compounds (CE and LG). This model assumes that the reaction obeys first-order reaction kinetics and considers the effect of temperature through the Arrhenius equation. Specific steps are as follows:

  1. First-order reaction kinetic model:

    • Table expression: r = k ⋅ [ C E ] r = k \cdot [CE] r=k[CE]
    • r r rReverse speed rate, C E CE CE is the concentration of CE, and k is the rate constant.
  2. Arrhenius equation:

    • Relate the rate constant k to temperature: k = A ⋅ e − E R T k = A \cdot e^{-\frac{E}{RT}} k=AIt isRTE
    • A is the pre-exponential factor, E is the activation energy, R is the ideal gas constant, and T is the temperature.
  3. Overall reaction rate equation:

    • Combining the above two equations, we get the overall reaction rate equation: r = A ⋅ e − E R T ⋅ [ C E ] r = A \cdot e^{-\frac{ E}{RT}} \cdot [CE] r=AIt isRTE[CE]
  4. Parameter Estimation:

    • Using experimental data, the parameters A and E in the Arrhenius equation and the rate constant in the first-order reaction kinetic model are estimated through methods such as the least squares method.
  5. Model verification:

    • Using the estimated parameters, apply the model to other experimental conditions and compare with experimental data to validate the model's predictive performance.
  6. sensitivity analysis:

    • Perform parameter sensitivity analysis to understand the sensitivity of the model to parameter changes. This helps determine which parameters have a greater impact on the model's output.
  7. Model interpretation and physical significance:

    • Interpret the physical meaning of parameters in the model and understand the contribution of the model to experimentally observed phenomena. Verify that the model is consistent with the reaction mechanism observed in the laboratory.
  8. Optimization model:

    • Based on the results of validation and interpretation, the model is optimized. It may be necessary to adjust the form of the kinetic model, further refine the reaction mechanism, or consider the influence of other factors.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# 假设的一级反应动力学模型
def first_order_kinetics(C, A, E, T):
    R = 8.314  # 理想气体常数
    k = A * np.exp(-E / (R * T))
    return k * C

# 生成一些模拟数据
CE_data = np.array([concentration1, concentration2, ...])  # 你的CE浓度数据
rate_data = np.array([rate1, rate2, ...])  # 你的反应速率数据

# 参数估计的初始值
initial_guess = [1.0, 1.0, 300.0]  # A, E, 初始温度

# 利用最小二乘法进行参数估计
params, covariance = curve_fit(first_order_kinetics, CE_data, rate_data, p0=initial_guess)

# 获取估计得到的参数
A_estimate, E_estimate, T_estimate = params

# 预测模型
predicted_rates = first_order_kinetics(CE_data, A_estimate, E_estimate, T_estimate)

# 绘制拟合结果
plt.scatter(CE_data, rate_data, label='实验数据')

Question 5

Here, a support vector machine (SVM) is used to solve problem five, using a regression problem method, that is, predicting the yield or quantity of pyrolysis products. The following is a detailed modeling idea:

  1. data preparation:

    • Organize experimental data, including the yield or quantity of pyrolysis products under various conditions, and related experimental conditions. Make sure that the data includes characteristics (such as temperature, mixing ratio, etc.) and target variables (yield or amount of pyrolysis products).
  2. Feature selection and normalization:

    • Characteristics related to pyrolysis product yield or quantity were selected and normalized. Ensure features are on the same scale to improve support vector machine performance.
  3. Data partition:

    • The data set is divided into a training set and a test set, usually using cross-validation.
  4. Support vector machine regression model selection:

    • Choose an appropriate support vector machine regression model. For regression problems, support vector machines whose kernel function is the radial basis function (RBF) are usually used. Choose appropriate hyperparameters based on data characteristics.
  5. Model training:

    • Use the training set to train the support vector machine regression model.
  6. Model evaluation:

    • The model is evaluated using a test set, often using metrics such as mean square error (MSE) to evaluate prediction accuracy.
  7. Tweaks and optimizations:

    • Based on the evaluation results, adjust the hyperparameters of the support vector machine, such as adjusting the parameters of the kernel function, to optimize model performance.
  8. predict:

    • Under limited data conditions, the trained support vector machine regression model is used for prediction. Enter the experimental conditions and the model will give the corresponding prediction value of the yield or quantity of pyrolysis products.
  9. Model explanation:

    • Interpret the model's predictions and understand which conditions the model considers to have a greater impact on the yield or amount of pyrolysis products. Although support vector machine is a black box model, feature importance analysis can be performed through some methods.
  10. Iterate and improve:

    • If new experimental data becomes available, it can be used to further improve and iterate the model to increase the accuracy of predictions.

The following is a simplified code example for support vector machine regression using Python'ssklearn library, assuming that the data set includes featuresX and target variablesy

from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler

# 特征标准化
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 数据划分
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# 创建支持向量机回归模型
svm_model = SVR(kernel='rbf', C=1.0, epsilon=0.1)

# 模型训练
svm_model.fit(X_train, y_train)

# 模型预测
y_pred = svm_model.predict(X_test)

# 评估模型性能
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {
      
      mse}')

For more complete codes and ideas, please see here:
2023 9th Dimensional Cup Question B | Pyrolysis Catalytic Reaction Modeling Analysis, Senior Xiaolu leads the team to guide the full code article and ideas

Guess you like

Origin blog.csdn.net/Tech_deer/article/details/134437376