R data analysis: methods and evaluation of predictive model building for survival data

I wrote about the nomogram of survival analysis before. The nomogram is a predictive model visualization tool. The process we use it is actually a process of making predictions for new data. The internal model itself is a predictive model that we have trained based on existing data. Today is also a continuation of the previous article to continue writing the effect evaluation of the predictive model for survival analysis.

The difference between the prediction model of survival data and the prediction model of continuous variable outcome and categorical outcome we wrote before is that we have to consider the censoring and time factors of survival data. Through such a prediction model, our expected goal is to help clinicians answer the survival probability of a specific patient at a certain time . From this perspective, we have an anchor for the standard of model evaluation and evaluation.

Thus, survival prediction models differ from traditional prediction models for continuous or binary outcomes by appropriately accommodating censoring that is present in time-to-event data.to answer questions such as “What is the probability that this patient will be alive in 5 years, given their baseline covariate information?” This predicted probability can then be used by clinicians to make important decisions regarding patient care

For example, I collected a large representative data set of a certain cancer patient, and I learned a prediction model through my data, and then a new cancer patient, the model can tell me how long this patient can live.

If the new data itself has labels, we can evaluate the pros and cons of the model by comparing the actual labels (the survival probability at a specific time) with the model prediction results (the predicted survival probability at a specific time). The predictive models for logistic and conventional (categorical and continuous outcomes) remained the same.

Revisit survival analysis first

It is still to review the common terms in survival analysis:

Our outcome variable has two levels, one is events and the other is censoring; meanwhile, this outcome also depends on a time variable.

I just wrote that we are predicting the survival data, and the answer is that we are predicting the probability that an event happens at a particular time  .----the probability of an event occurring at a certain time. So at this time, the indicators of the conventional evaluation model are not easy to use.

Due to the presence of the censoring in survival data, the standard evaluation metrics for regression such as root of mean squared error and ܴ R2 are not suitable for measuring the performance in survival analysis.

For the prediction model of survival data, there are three indicators for evaluating the model at this time: Concordance index (C-index), Brier score, and Mean absolute error. Today's task is to take everyone through it one by one, hoping to help you understand why these indicators can be used to evaluate models under the anchor standard of "survival probability of an individual at a specific time."

Concordance index (C-index)

First look at the C index. This consistency index has been mentioned in the prediction model of the classification outcome. It is the area under the ROC curve. For the prediction model of survival data, this index has nothing to do with the sensitivity and specificity. It is compared with the actual and predicted values . The understanding method can refer to the rank sum test.

For a binary outcome, C-index is identical to the area under the ROC curve (AUC).

The concordance index or C-index is a generalization of the area under the ROC curve (AUC) that can take into account censored data. It represents the global assessment of the model discrimination power.

The logic is: each case is given a risk score through the model. If the model performs well, the case with the highest risk score should have an event first. According to this logic, we then use the model to assign a risk score to each case to form a number of comparable groups (two are a group). If the group does meet the just mentioned "the greater the risk score, the earlier the event occurs", then this group is a consistent group, otherwise it is an inconsistent group. The proportion of the consistent group to all comparison groups is the C-index:

The index is calculated as follows:

Among them, the numerator is the consistent group, and the denominator is all groups. Then this value is the bigger the better.

The above is the internal logic of C-index in the evaluation of survival analysis and prediction models. As applied scientific researchers, it is good for you to pay attention to the logic, and please automatically ignore the mathematical expressions.

Brier score

Let's look at the second evaluation indicator, called Brier score. The Brier score is the mean of the square of the difference between the survival status of the case at time t minus the predicted survival probability at time t .

The logic that can be used to evaluate the model is: if my model can really predict the survival probability at a specific moment very well, then for a certain moment my survival status is indeed 1, then the model should say that my survival probability at this time is infinite; otherwise the model should say that my survival probability is infinitely small.

Because it involves specific time, this indicator can only be viewed at one point in time. The algorithm is as follows:

we found that BS depends on the selection of time point t. Generally, the median of the observation time is selected as the time point.

It is the difference in the predictive probability of survival at the time of T time at the time of T time. For example, the actual observation of death at the time of T time is death (take 0) at this time. At this time, the smaller the probability of the model prediction should be better. It is the smaller the smaller the better, and it is less than 0.25 to indicate that this model is better than guessing . But this indicator can only look at the prediction accuracy of the model at a certain point in time.

The above is the internal logic of the Brier score in the evaluation of the survival analysis prediction model. As applied scientific researchers, it is good for everyone to pay attention to the logic and automatically ignore the mathematical expression.

Mean absolute error

The MAE index is also available in the prediction model of continuous variable outcome, which refers to the sum of the absolute value of the difference between the predicted value and the actual value. In the prediction model of survival analysis, it is the sum of the absolute value of the difference between the actual survival time and the model-predicted survival time . The algorithm is as follows:

This indicator only considers non-censored data, and it is rarely used in practice. Basically don't care.

Model evaluation practice

After explaining the indicators, let's look at the actual operation method. We still choose  the article of JAMA Surg. as a reference. The title of the article is as follows:

Hyder O, Marques H, Pulitano C, et al. A Nomogram to Predict Long-term Survival After Resection for Intrahepatic Cholangiocarcinoma: An Eastern and Western Experience. JAMA Surg. 2014;149(5):432–438. doi:10.1001/jamasurg.2013.5168

The methodology of model evaluation in the article is introduced as follows:

It can be seen that this article reports the C index, draws a calibration curve with self-sampling samples, and verifies the model. Let's first look at the practice of the C index. The value and confidence interval of the C index are reported in the article:

Predictive accuracy (discrimination) of the final model was measured by calculating the Harrell C index, which was 0.692 (95% CI, 0.624-0.762).

If you run the model with the coxph function, the standard error of the C index and the C index will automatically appear in the output of the model, as shown in the figure below:

For example, if we want this index alone, we can run the following code directly:

cindex(formula, data)

If you want to get the confidence interval of the C index, you have to turn to the concordance.index function, the code is as follows:

concordance.index(predict(c),surv.time = dt,surv.event = e,method = "noether")

The output is as follows, with C index, standard error and corresponding upper and lower limits of confidence interval:

After reading the operation of the C index, let's look at the drawing of the calibration curve. The calibration curve given in the paper looks like this:

First of all, let's understand what a calibration curve is. In the figure above, the horizontal axis is the survival probability predicted by the model, and the vertical axis is the actual survival probability. There is also a gray dotted line in the figure, which means that the predicted probability is consistent with the actual survival probability. Ideally, the calibration curve is a diagonal line (the predicted probability is equal to the actual probability).

Calibration plot is a visual tool to assess the agreement between predictions and observations in different percentiles (mostly deciles) of the predicted values.

It should also be understood that the distribution of our own survival probability is continuous, but only 3 points are drawn in the figure. This is because the algorithm divides the data into bins. In the figure above, the original data is divided into 3 groups. This operation is controlled by the parameter m in the calibrate function

For survival models, "predicted" means predicted survival probability at a single time point, and "observed" refers to the corresponding Kaplan-Meier survival estimate, stratifying on intervals of predicted survival。

At the same time, we also need to limit the time when we make predictions about the survival data, so we need to set the parameter u.

For example, if we want to self-sampling 20 times, the data is binned, 200 per bin, and the sample code for the calibration curve at time point 6 is as follows:

cal <- calibrate(f, u=6, cmethod='KM', m=200, B=20)
plot(cal)

Regarding the results of model verification, the paper reports that the C index of the training data and test data in the repeated sampling verification results shows that the model is not overfitting. The original text is as follows:

Bootstrap validation of the model with 300 iterations revealed minimal evidence of model overfit. The training data set C statistic was 0.699, and the testing data set C statistic was 0.706, which represented the bias-corrected estimate of model performance in the future.

The implementation code of this part is as follows:

validate(f, B=300) 

The C index of the corresponding data set can be calculated from the output results.

D_{xy} are equal to 2 * (C - 0.5)where C is the C-index or concordance probability

Then we compare the difference between the C index of the training data and the test data to get the conclusion of our own model.

Well, here we have finished writing the practice and evaluation method of the survival data prediction model written for you according to the JAMA surgery article. In fact, there are other evaluation methods for the survival data prediction model, such as time-dependent ROC, decision curve, etc., which are arranged in the next issue, please continue to pay attention.

Guess you like

Origin blog.csdn.net/tm_ggplot2/article/details/130230420