Deep Learning for Medical Prognosis - Session 2, Week 3, Section 4-7 - Time Data About Events, Recognizing and Dealing with Censored Data

In this lesson, we will discuss survival data. To be able to model survival, we need to be able to represent the data in a form we can process.

The main challenge is censored data, which is a special form of missing data. We will examine this next.

In this lesson, we'll talk about survival data and censoring.

Earlier when we looked at prognostic models, we would ask the question, what is the probability of survival over the past five years? We have data in the following format

We have a group of patients, previously we used 1 to indicate that this patient had an event and 0 to indicate that this patient may not have an event for five years. This is the data we use.

Note that the key here is that we need answers/results that are basically yes or no (1 or 0)

But now, when we're dealing with survival data, we want to answer a different question. "What is the probability of survival at any time t in the past, not just the past five years?" To be able to do this, so we need a piece of information. Let's walk through an example to see how we can get this information when we provide it to patients.

When a stroke goes undetected

For example, we will observe patients undergoing treatment and monitor them for stroke events. If they had a stroke, see how much time passed between treatment and the stroke.


Let's say, our first patient was treated in September 2018 and they had a stroke a year later in September 2019. So for this patient, let's say we track the number of months. This corresponds to 12 months from treatment to stroke event. Therefore, we will enter the number of months that have passed in this table, which is 12 months.

Let's look at another patient.

So for this patient, who was treated in August 2018, we followed him for over a year until October 2019, when we decided to end the study. During this time period, we did not observe any stroke events. We know of patients who have been event-free for 14 months.

If they had an event (stroke), it should have been 14 months later. So we wrote a 14+.

In the third example, patients were treated in August 2018, and just three months later, in November 2018, they decided to withdraw from the study. This condition is very common for many reasons. Let's say this patient had to change country and therefore had to withdraw from the study.

So we know that between August 2018 and November 2018, they didn't have a stroke, but we don't know what happened after that. So we have to say that the patient's time is more than 3 months. So we enter that into the form.

Now, the second and third cases are what we call censoring (or censoring). is an important component of survival data and needs to be considered.

heart attack data

Let's apply this knowledge to an example. In this example, we'll look at patients undergoing surgery and see if they have a heart attack after surgery.

We had three patients and we started the study in January 2015 and we ended the study in July 2019. So patients came in for surgery at different times, and we tracked when they had a heart attack.

So for the first patient, they had surgery in March 2016 and had a heart attack in March 2017, so we can put that down, patient 1 is 12 months.

Patient 2 underwent surgery in July 2015, and we did not observe any heart attack events until July 2019. So it's a four-year process, which equates to 48 months. Now mind you, we haven't seen any activity in 48 months, so we're going to put a plus here.

For the third patient who underwent surgery in November 2015, they withdrew from surgery in November 2017, so we followed them for two years or 24 months, and we observed no events during that time period, so we write 24 months+.

So in this way we can represent our survival data in the form on the right of the picture.

To summarize, we have survival data, the shift from representing data as yes or no, as we do in the binary setting, to asking when and representing time from origin to event, and having these censored observations as part of our data, which we will look at shortly.

right loss


We mentioned censoring briefly earlier, when we saw a patient who had surgery in August 2018 and then dropped out of the study before the end of the study in November 2018. From this we concluded that the patient was disease-free for three months.

Now, it is possible that this patient had an event in January 2019 (such as the event of a stroke), but it could be any event.

But it is also possible that nothing of the kind happened to the patient, who has just been healthy.

Note that the event always occurs after the last contact, if it occurs at all, this is called "right censoring". Formally, it is the time that an event occurs that exceeds a certain value.

For example, in the picture above, if this is August 2018 and here is November 2018, then the event will take place for more than three months, so we write this data point as 3+.

We have studied two types of right censoring:

  • First, we have a patient missing because our study ends, this is called end-of-study censoring;
  • Second, we have a patient drop out before the end of the study, which is called loss-to-follow-up censoring.

Now, censoring is a very important concept in survival data, it is very necessary for understanding survival models, as we will see shortly.

In the next chapter, we will learn to use censored data for survival assessment~

The article is continuously updated, and you can follow the WeChat public account [Medical Image Artificial Intelligence Practical Camp] to get the latest news, a public account that focuses on cutting-edge technologies in the field of medical image processing. Adhere to what has been practiced, and lead you to do projects, play games, and write papers. All original articles provide theoretical explanations, experimental codes, and experimental data. Only practice can grow faster, pay attention to us, learn and progress together~

I'm Tina, see you in the next blog~

Working during the day and writing at night, working hard

If you think the writing is good, at the end, please like, comment, and bookmark. Or one key triple
insert image description here

Guess you like

Origin blog.csdn.net/u014264373/article/details/130674966