Solve Problem E of the 2023 Huawei Cup Graduate Mathematical Modeling Competition based on Matlab - Implementation steps of clinical intelligent diagnosis and treatment modeling of hemorrhagic stroke (source code + data attached)

Background introduction

Hemorrhagic stroke refers to cerebral hemorrhage caused by non-traumatic intraparenchymal blood vessel rupture, accounting for 10-15% of all stroke incidences. The cause of the disease is complex, usually due to factors such as ruptured cerebral aneurysm and abnormal cerebral arteries, which cause blood to flow into the brain tissue from the ruptured blood vessels, causing mechanical damage to the brain and triggering a series of complex physiological and pathological reactions. Hemorrhagic stroke has an acute onset, rapid progression, and poor prognosis. The mortality rate in the acute period is as high as 45-50%. About 80% of patients will have severe neurological dysfunction, which brings heavy health consequences to society and the patient's family. and financial burden. Therefore, it is of great clinical significance to explore the risk of hemorrhagic stroke, integrate imaging characteristics, patient clinical information and clinical diagnosis and treatment plans, accurately predict patient prognosis, and optimize clinical decision-making accordingly.

After hemorrhagic stroke, the expansion of hematoma is one of the important risk factors for poor prognosis. In a short period of time after the hemorrhage occurs, the scope of the hematoma may gradually expand due to factors such as brain tissue damage and inflammatory response, leading to a rapid increase in intracranial pressure, which may further worsen neurological function and even endanger the patient's life. Therefore, monitoring and controlling hematoma expansion is one of the key clinical concerns. In addition, edema around the hematoma, as a marker of secondary injury after cerebral hemorrhage, has attracted widespread clinical attention in recent years. The edema around the hematoma may cause compression of the brain tissue, which in turn affects neuron function and further damages the brain tissue, thereby aggravating the patient's neurological impairment. In summary, early identification and prediction of two important key events after hemorrhagic stroke, namely hematoma expansion and the occurrence and development of perihematoma edema, are of great significance to improving patient prognosis and improving their quality of life.

The rapid progress of medical imaging technology provides a powerful means for non-invasive dynamic monitoring of brain tissue damage and evolution after hemorrhagic stroke. In recent years, artificial intelligence technology has developed rapidly and been widely used in the medical field, bringing new opportunities for in-depth mining and intelligent analysis of massive imaging data. It is expected that based on the imaging information provided in this competition, combined with patient personal information, treatment plan and prognosis data, an intelligent diagnosis and treatment model can be built to clarify the risk factors leading to poor prognosis of hemorrhagic stroke and achieve accurate and personalized efficacy evaluation and prognosis prediction. . It is believed that in the near future, relevant research results and scientific evidence will be further applied in clinical practice and contribute to improving the prognosis of patients with hemorrhagic stroke.

Figure 1. CT scan of a patient with cerebral hemorrhage on the left. The red on the right is the hematoma and the yellow is the edema around the hematoma:
Insert image description here

Preparation: Processing the data

Read, organize and save data. Through the processing and integration of data, a Tablematrix called is established for subsequent modeling analysis:

  1. Read the data table file and store the data in the corresponding variables. The data is preprocessed to replace missing values ​​with empty strings and code gender (male is 1, female is 2).

  2. Organize the data and organize the tabular data into a Tablematrix called in a predetermined format. TableThe first line is the header, including patient, serial number, time and the name of each indicator.

  3. Loop through each row in the data table (removing the header), and for each row of data, obtain the length of time from onset to first diagnosis delta_t0and the date of first diagnosis t1.

  4. Find the corresponding row and column index in Appendix Table 1 according to the serial number and obtain the subsequent diagnosis date t2.

  5. Calculate the length of time between each follow-up visit and the onset of disease on the timeline time.

  6. The data of each inspection is counted. If there is any missing data, it will not be recorded. According to the date in Appendix 1 and the data in Table 2, combine the patient number, diagnosis date, length of time, data in Table 1, data in Table 2 and data in Table 3 into one row and add to Table.

  7. For missing data, based on the differences between samples, the average of the Annex 3 indicators in the Annex 3 data set is used to supplement.

  8. Save the organized files Tablein dataa file named.

Question 1: Exploration and modeling of factors related to the risk of hematoma expansion

a) Question

  • Please use "Table 1" (fields: serial number of the first imaging examination after admission, time interval from onset to first imaging examination) and "Table 2" (fields: serial numbers at each time point and corresponding HM_volume) to determine after the onset of patients sub001 to sub100 Whether a hematoma expansion event occurred within 48 hours.
  • Result filling specifications: 1 yes, 0 no, filling position: field C of "Table 4" (whether hematoma expansion occurs).
  • If a hematoma expansion event occurs, please also record the time when the hematoma expansion occurs.
  • Result filling specifications: For example, 10.33 hours, filling position: "Table 4" field D (hematoma expansion time).
  • Whether hematoma expansion occurs can be based on the changes in hematoma volume, which is specifically defined as: an absolute volume increase of ≥6 mL or a relative volume increase of ≥33% in subsequent examinations compared with the first examination.
  • Note: You can query the corresponding imaging examination time point through the serial number to "Appendix 1 - Search Form - Serial Number vs Time", and combine the time interval from onset to the first imaging and the subsequent imaging examination time interval to determine whether the current imaging examination is during the onset of illness 48 within hours.

Specific code implementation steps:

  1. First, extract the unique value from the first column of the table "Table" through the unique function and save the result in the variable "Num".

  2. Next, four empty variables are defined: A, B, Y, and T. These variables will be used to store patient metrics, fitting parameters, labels for disease onset, and venous dilation time.

  3. Use a loop to iterate through each unique value in "Num".

  4. In each loop, first find the index of the row in table "Table" that matches the current unique value, and save the data in column 4 and beyond of these rows in variable "A".

  5. Then, store the 3rd column data of these rows in the variable "t" and the 23rd column data in the variable "HM_volume".

  6. Determine whether the length of "HM_volume" is greater than 0. If it is greater than 0, it means there is multiple data that can be used for linear regression. Then use the regress function to perform linear regression and save the regression coefficient in the variable "b".

  7. If the length of "HM_volume" is equal to 0, it means there is only one data, then you need to use the parameters corresponding to the closest other samples in the sample data for matching. First, A is normalized, then the distance between the last sample and other samples is calculated, the closest sample is found, and its corresponding parameters are saved in the variable "B".

  8. Continue the loop and save variable "b" in variable "B".

  9. Then, the time is gridded with an accuracy of 0.1, and a time series "t1" is generated from the first diagnosis to the 48th hour of onset.

  10. Use the regression parameter "b" to calculate the HM_volume value corresponding to "t1" and save it in the variable "HM_volume_48".

  11. Find the index where the ratio of HM_volume_48 to HM_volume(1) is greater than 1.33 and save it in the variable "a1".

  12. Find the index where the difference between HM_volume_48 and HM_volume(1) is greater than 6000, and save it in the variable "a2".

  13. Merge the indexes of "a1" and "a2" and save them in the variable "aa".

  14. Determine whether the length of "aa" is greater than 0. If it is greater than 0, it means that an illness has occurred. Set the value of the corresponding position in the variable "Y" to 1, and save the first time in the variable "t1" that meets the condition in the variable "T" in.

  15. If the length of "aa" is equal to 0, it means there is no disease, and the value of the corresponding position in the variable "Y" is set to 0.

  16. After the loop ends, "Num", "Y" and "T" are combined into a result matrix "result1", which saves the unique value of each patient, the label of whether the disease occurs, and the venous dilation time.

b) Question

  • Please use whether a hematoma expansion event occurs as the target variable, based on the personal history, disease history, onset-related (fields E to W) of the first 100 patients (sub001 to sub100) in "Table 1", and their imaging examination results in "Table 2" (Fields C to Probability.
  • Note: This question can only include the patient’s first imaging examination information.
  • Result filling specifications: record the predicted probability of event occurrence (value range 0-1, retain 4 digits after the decimal point); fill in location: "Table 4" field E (predicted probability of hematoma expansion).

Specific code implementation steps:

  1. First, use the mapminmax function to normalize the variable "A" and save the result in the variable "In". At the same time, the label variable "Y" is transposed and stored in the variable "Out".

  2. The data set is divided into a training set and a test set, where the first 100 samples are used for training, 101 to 130 samples are used for test set 1, and 131 to 160 samples are used for test set 2.

  3. Create a bp neural network. The number of input layer nodes of the network is half of the input data, the number of hidden layer nodes is a quarter of the input data, and the activation function uses tansig.

  4. Set the training parameters of the neural network, including the maximum number of iterations, target error, minimum gradient, learning rate, and maximum number of confirmation failures.

  5. Use the training data to train the neural network.

  6. Conduct simulation tests on the training set, test set 1 and test set 2, use the trained neural network to predict the input data, and save the prediction results in the variables "t_sim", "t_sim1" and "t_sim2".

  7. The prediction results are rounded to obtain the final prediction results, which are stored in the variables "T_sim", "T_sim1" and "T_sim2".

  8. Draw a confusion matrix to visualize the true labels and prediction results of the training set, test set 1 and test set 2 respectively.

  9. Finally, add the prediction results "t_sim", "t_sim1" and "t_sim2" to the previous result matrix "result1", and save the result matrix in the variable "result1". The outcome matrix includes patient number, whether hematoma expansion occurred, time to hematoma expansion, and predicted probability of hematoma expansion.

Question 2: Model the occurrence and progression of edema around the hematoma, and explore the relationship between therapeutic intervention and edema progression

a) Question

  • Please construct an edema volume progression curve over time for all patients based on the edema volume (ED_volume) and repeated examination time points of the first 100 patients (sub001 to sub100) in "Table 2" (x-axis: time from onset to imaging examination, y-axis: Edema volume, y=f(x)), calculate the residual between the true values ​​of the first 100 patients (sub001 to sub100) and the fitted curve.
  • Result filling specifications: record the residuals and fill in the F field (residuals (all)) of "Table 4".

Specific code implementation steps:

  1. First, use uniquethe function to find the unique value in the second row of the data table and the first column thereafter, and save it in a variable Num.

  2. Create empty matrices T, EDand Aare used to record re-examination time, edema data and other diagnostic information respectively.

  3. Loop through Numeach value in the value, and for each value, find the index of the row in the data table where the first column is equal to the value, and save it in a variable a.

  4. Convert the data in column 4 and beyond of the corresponding row to doubletype and add it to the matrix A.

  5. Convert the data in column 3 of the corresponding row to doubletype and add it to the matrix T.

  6. Convert the 34th column data of the corresponding row to doubletype and add it to the matrix ED.

  7. Draw a graph with the abscissa as T, the ordinate as ED, and the point shape as an asterisk.

  8. Set the abscissa range from 1 to 2000.

  9. Fit the data using a Gaussian model and save the fitting results in variables gaussModel.

  10. Draw a Gaussian model curve.

  11. Set the horizontal axis title to "Time" and the vertical axis title to "Edema/10^-3ml".

  12. Calculate the fitted value ED_fit.

  13. Create an empty matrix Errorto hold the residuals.

  14. Loop through Numeach value in the value, and for each value, find the index of the row in the data table where the first column is equal to the value, subtract 1 from the index, and save it in a variable a.

  15. The average of the absolute differences between the fitted values ​​and the original data is calculated and saved at the Errorcorresponding position in the matrix.

b) Question

  • Please explore the individual differences in the patient's edema volume progression pattern over time, construct the edema volume progression curve over time for different populations (subgroups: 3-5), and calculate the difference between the true value and the curve of the first 100 patients (sub001 to sub100) of residuals.
  • Result filling specifications: record the residuals, fill in the G field (residuals (subgroup)) of "Table 4", and fill in the subgroup to which it belongs in section H (subgroup to which it belongs).

Specific code implementation steps:

  1. Set the number of cluster centers to 5 and save it in a variable cluster_n.

  2. Use the fuzzy C-means clustering (FCM) algorithm to cluster the indicators in columns 4-15 of the data table, and the clustering results are stored in the variables center, Uand obj_fcn. Among them, centeris the cluster center, Uis the membership matrix, obj_fcnand is the change in the value of the objective function.

  3. Draw the change graph of the objective function value. The horizontal axis is the number of iterations and the vertical axis is the objective function value.

  4. UThe cluster center to which it belongs is determined based on the maximum membership degree of each sample in the membership matrix and stored in a variable u.

  5. Create an empty matrix ED_fit2to hold the clustered fitted values.

  6. Loop through the number of cluster centers, and for each cluster center, find the index of the sample belonging to the cluster and save it in a variable a.

  7. Find the index of the sample belonging to this cluster in the original data table and save it in a variable b.

  8. Draw a graph with the abscissa as T(b), the ordinate as ED(b), and the point shape as an asterisk.

  9. Set the abscissa range from 1 to 2000.

  10. A Gaussian model is used to fit the samples belonging to this cluster, and the fitting results are saved in variables gaussModel.

  11. Draw a Gaussian model curve.

  12. Set the horizontal axis title to "Time" and the vertical axis title to "Edema/10^-3ml".

  13. Set the figure title to "Subclass i", where i is the number of the cluster center.

  14. Calculate the fitted value ED_fit2.

  15. Create an empty matrix Error2to hold the clustered residuals.

  16. Loop through Numeach value in the value, and for each value, find the index of the row in the data table where the first column is equal to the value, subtract 1 from the index, and save it in a variable a.

  17. Calculate the average of the absolute differences between the clustered fitted values ​​and the original data, and save them at the Error2corresponding positions in the matrix.

  18. Combine the patient number, overall residuals, clustered residuals, and clustering results into a matrix result2.

c) Problem

  • Please analyze the impact of different treatments ("Table 1" fields Q to W) on the pattern of edema volume progression.

Specific code implementation steps:

  1. Create an empty matrix Kto save the rate of change of edema indicators.

  2. Loop through Numeach value in the value, and for each value, find the index of the row in the data table where the first column is equal to the value, and save it in a variable a.

  3. Determine whether the patient's data meets the requirements of at least five examinations, and if so, calculate the rate of change of the edema index.

  4. Save the index of the data whose change rate of edema index is less than 0 in the variable kk.

  5. If kkthe length of is 0, it means that the edema index has not been reduced and will K(i,1)be set to 0.

  6. Otherwise, calculate the average reduction in edema indicators and save the results in K(i,1).

  7. K(i,1)Set to if the data does not meet the requirement of at least five checks NaN.

  8. Find the index of the data Kthat is not in the middle NaNand save it in the variable c.

  9. Convert the data in columns 16-22 in the data table into doubletypes and save them in variables G.

  10. Save the data in columns 16-22 of the first row of the data table in a variable Z.

  11. Loop through Zeach value in and, for each value, perform a one-way ANOVA.

  12. Calculate critical values ​​for analysis of variance fa.

  13. Get the F value from the ANOVA results.

  14. The impact of different treatment methods on the edema progression pattern was judged based on the p value and F value.

  15. Print information about treatments that have a significant impact.

  16. Print out the ranking results of the effects of different treatments on the edema progression pattern.

d) Question

  • Please analyze the relationship between hematoma volume, edema volume and treatment methods (fields Q to W in "Table 1").

Specific code implementation steps:

  1. Create an empty matrix K2to hold the rate of change of hematoma indicators.

  2. Loop through Numeach value in the value, and for each value, find the index of the row in the data table where the first column is equal to the value, and save it in a variable a.

  3. Determine whether the patient's data meets the requirements of at least five examinations, and if so, calculate the change rate of the hematoma index.

  4. Save the index of the data whose change rate of hematoma index is less than 0 in the variable kk.

  5. If kkthe length of is 0, it means that the hematoma index has not been reduced and will K2(i,1)be set to 0.

  6. Otherwise, calculate the average of the hematoma index reductions and save the results in K2(i,1).

  7. K2(i,1)Set to if the data does not meet the requirement of at least five checks NaN.

  8. Find the index of the data K2that is not in the middle NaNand save it in the variable c.

  9. Loop through Zeach value in and, for each value, perform a one-way ANOVA.

  10. Calculate critical values ​​for analysis of variance fa.

  11. Get the F value from the ANOVA results.

  12. The impact of different treatment methods on the hematoma progression pattern was judged based on the p value and F value.

  13. Print information about treatments that have a significant impact.

  14. Print out the ranking results of the effects of different treatments on hematoma progression patterns.

  15. Convert the hematoma index and edema index into doubletypes and save them in the variables x0sum y0.

  16. Calculate the cosine similarity between the hematoma index and the edema index and save it in a variable theta.

  17. Print the correlation between hematoma index and edema index.

Question 3: Prognosis prediction and key factors exploration of patients with hemorrhagic stroke.

a) Question

  • Please build a prediction model based on the personal history, disease history, disease related (fields E to W in "Table 1") and first imaging results (related fields in Table 2 and Table 3) of the first 100 patients (sub001 to sub100) to predict the patients ( sub001 to sub160) 90-day mRS score.
  • Note: This question can only include the patient’s first imaging examination information.
  • Result filling specifications: record the predicted mRS results, 0-6, ordinal grade variables. Fill in the position "Table 4" I field (predicted mRS (based on first imaging)).

Specific code implementation steps:

  1. Use mapminmaxa function to normalize the input data A, save the normalized results in variables In, and save the normalized parameters in ps_in.

  2. Transpose the output data Yto Out.

  3. Split the data into training and test sets. InThe first 100 columns of are used as the input of the training set, and the first 100 columns of are used Outas the output of the training set. Use Incolumns 101-130 as the input of test set 1, and use Outcolumns 101-130 as the output of test set 1. Use Incolumns 131-160 as the input of test set 2, and use Outcolumns 131-160 as the output of test set 2.

  4. Create a BP neural network. Set the structure of the network to two hidden layers, the number of neurons in the first hidden layer is half of the input data, and the number of neurons in the second hidden layer is a quarter of the input data. The activation function uses the hyperbolic tangent function.

  5. Set training parameters, including maximum iterations, target error, minimum gradient, learning rate, and maximum number of confirmation failures.

  6. Train the network using the training set data.

  7. Conduct simulation tests on the training set, test set 1 and test set 2 to obtain prediction results. Round the prediction results and save them in variables t_sim, t_sim1and t_sim2.

  8. Use plotconfusiona function to plot a confusion matrix. Draw the confusion matrix diagrams of the training set, test set 1 and test set 2 respectively. The confusion matrix is ​​used to evaluate the classification performance of the model.

b) Question

  • Based on all known clinical, treatment (fields E to W in Table 1), imaging (first + follow-up) results of Table 2 and Table 3 for the first 100 patients (sub001 to sub100), predict all patients with follow-up imaging examinations (sub001 to sub100, sub131 to sub160) 90-day mRS score.
  • Result filling specifications: record the predicted mRS results, 0-6, ordinal grade variables. Fill in the position "Table 4" J field (predicted mRS).

Specific code implementation steps:

  1. Use the function to normalize the mapminmaxinput data , use the normalization parameters obtained when training the network before , and save the normalized results in variables .Bps_inXyuce

  2. Conduct simulation tests on the normalized data Xyuceto obtain prediction results and save them in variables t_sim3.

  3. Round the prediction results and save them in variables T_sim3.

  4. Create an empty string matrix result3to save the final prediction results.

  5. Loop through Numeach value in, and for each value, find Numthe index equal to the value in, and save it in a variable a.

  6. Loop through the indexes a, and for each index, save the corresponding prediction results in result3the corresponding location.

  7. Will be result3converted to string type.

  8. Replace elements with missing values ​​with empty strings.

  9. Create an empty matrix zholding a string containing the number of follow-up visits.

  10. The number of columns to loop through result3, and for each column, save the corresponding string in z.

  11. Merge strings of patient numbers, first test results, and number of follow-up visits into a matrix result3.

c) Problem

  • Please analyze the correlation between the prognosis (90-day mRS) and personal history, disease history, treatment methods and imaging features (including hematoma/edema volume, hematoma/edema location, signal intensity characteristics, and shape characteristics) of patients with hemorrhagic stroke to provide Make recommendations for clinically relevant decisions.

Specific code implementation steps:

  1. Convert the data from row 2 to the last row and column 4 to the last column in the data table into doubletypes and save them in variables X.

  2. Create an empty matrix Pto hold the correlation matrix.

  3. Using a nested loop to traverse Xeach column and each column, calculate the cosine similarity between the vectors of the corresponding columns and save the results in the corresponding Plocation of the correlation matrix.

  4. Draw a heat map, enter the correlation matrix Pas data, set the color bar visible, adjust the position of the plot, font size and cell label format.

  5. Merge variables Zand correlation matrices Pinto a matrix resultpfor self-analysis of correlation results.

  6. Print out the prompts for prediction results and correlation results.

Complete source code + data download

Complete code file for solving the 2023 Huawei Cup E question based on Matlab (source code + data).rar: https://download.csdn.net/download/m0_62143653/88376174

Guess you like

Origin blog.csdn.net/m0_62143653/article/details/133340019