"Pink Killer" wanted poster, AI's ability to read mammograms is as good as that of doctors

According to statistics from the World Health Organization, there were 2.3 million new cases of breast cancer worldwide in 2020, ranking first among all cancers, surpassing lung cancer to become the number one cancer.
However, if it can be detected early and treated promptly to kill cancer cells before they metastasize, the mortality rate of breast cancer can be greatly reduced. At present, the commonly used method for primary screening for breast cancer is mammography
, and doctors then analyze and review the X-rays to judge the health of the breasts. However, the review process consumes a lot of time and affects the medical treatment of other patients.
To this end, researchers from the University of Nottingham in the UK compared the ability of commercial AI to read mammograms with doctors, providing new ideas for the application of AI in clinical medicine.

Author | Xuecai
Editor | Sanyang, Iron Tower
This article was first published on the HyperAI Super Neural WeChat public platform~

According to statistics from the American Cancer Society, there will be approximately 930,000 new cancer cases among American women in 2022, including approximately 290,000 new breast cancer patients, accounting for 31%. At the same time, breast cancer patients account for 15% of cancer deaths, second only to lung cancer.
Insert image description here

Figure 1: Number of new cancer cases (top) and cancer deaths (bottom) in the United States in 2022

In China, breast cancer accounts for the highest proportion of cancers among female patients in the 21st century , and the number of new patients is increasing every year.
Insert image description here

Figure 2: The number of new cancer cases in women in my country from 2000 to 2016. The gray indicates the number of breast cancer cases.

Breast cancer is a disease caused by abnormal breast cells growing out of control and forming tumors. If not intervened in time, the tumor will metastasize and spread, eventually becoming life-threatening. However, if local tumors can be detected in the early stages of cancer and treatment is initiated, the five-year cancer survival rate can reach 99%.

Currently, hospitals generally perform primary screening for breast cancer through mammography. However, false positives can occur during primary screening , leading to unnecessary testing for patients who do not have cancer. Omissions may also occur, delaying the optimal treatment time for patients.

Therefore, many European countries review mammograms to eliminate as many false positive cases as possible. This method is effective and increases the cancer detection rate by 6%-15% while reducing false positives.

However, reading and evaluating X-rays takes considerable time. In areas with a low doctor-patient ratio, review of X-rays not only takes up doctors’ time, but also affects the early screening of other patients.

The application of AI has partially alleviated the work pressure of doctors, but it seems a bit unsafe to leave life and health to AI to evaluate. In this regard, Professor Yan Chen from the University of Nottingham in the UK said, "There is a lot of pressure to apply AI to clinical medicine, but we need to do this well to protect women's health."

To this end, Yan Chen's team compared the accuracy of commercial AI Lunit and doctors' reading of mammograms. The results showed that Lunit's ability to analyze mammograms was comparable to that of human physicians. This result has been published in "Radiology".
Insert image description here

Paper link:

https://pubs.rsna.org/doi/10.1148/radiol.223299#_i13

experiment procedure

Dataset: PERFORMS data set

This study selected two sets of PERFORMS data sets as the test set of the model. Each PERFORMS data set consists of 60 challenging X-rays, including malignant tumors (about 35%), benign tumors, and normal results. The PERFORMS dataset has been used for entry testing and routine assessment of doctors in the UK's National Health Service Breast Screening Program (NHSBSP) for the past 30 years.

Evaluation criteria: mark + score

When analyzing X-rays, doctors will mark suspicious locations and finally give a rating of 1-5, corresponding to normal, benign, uncertain, suspicious and malignant.

The AI ​​will rate the suspiciousness of each feature of the X-ray on a scale of 1-100 , with the highest score being considered the score for the entire X-ray. If there are no suspicious features, it will be considered as 0 points.
Insert image description here

Figure 3: Analysis results of mammograms by doctors and AI

A: The blue arrow is an unidentified mass with a diameter of 8 mm, which was later identified as histological grade 2 ductal carcinoma;

B: The red crosses are abnormal features discovered by AI, and the blue dots are suspicious areas marked by doctors during analysis.

Comparison results: specificity + sensitivity

A total of 552 doctors participated in this competition, accounting for 68% of the total number of NHSBSP, including 315 radiologists, 206 radiographers and 31 clinicians.

After analyzing two sets of PERFORMS data sets, they determined that 161 mammograms were normal, 70 had malignant tumors in the breast, and 9 were benign. Common features of malignancy included mass (64.3%), calcification (12.9%), asymmetry (11.4%), and architectural distortion (11.4%), with a mean lesion size of 15.5 ± 9.2 mm.
Insert image description here

Table 1: PERFORMS dataset results

The average AUC for the human group was 0.88. The AUC of the AI ​​group was 0.93, corresponding to the 96.8 percentile of the human group, but there was no significant difference in the AUC of the two groups.
Insert image description here

Figure 4: Doctor group AUC histogram and AI AUC (yellow line)

The average sensitivity and specificity for the human panel were 90% and 76%, respectively. At the developer-recommended thresholds, the AI's sensitivity and specificity were 84% and 89%, respectively.
Insert image description here

Table 2: Judgment results of doctor group and different threshold AI

TP: true positive;

FP: false positive;

TN: true negative;

FN: false negative;

Sensitivity = TP / total number of positives;

Specificity = TN / total number of negatives.

In the ROC curve of AI, 52% of doctors performed above the curve, 36% performed below the curve, and 12% performed consistent with the ROC curve.
Insert image description here

Figure 5: ROC curve of AI, where the blue points are the performance of different doctors

When the threshold of AI was 3.06, the sensitivity of AI was consistent with that of doctors, detecting 63 cases of malignant tumors and missing only 7 cases. At this time, the specificity of AI is not significantly different from that of doctors.

When the threshold was set to 2.91, the AI's specificity was consistent with that of the physician group, with a sensitivity of 91%. The above results show that the sensitivity and specificity of Lunit's AI in analyzing mammograms is comparable to that of human doctors.
Insert image description here

Figure 6: The impact of different thresholds on AI judgment results

A: The blue arrow indicates the asymmetric area, which was later identified as histological grade 2 ductal carcinoma;

B: The detection result when the AI ​​threshold is 2.91, the red cross is finally identified as a true positive;

C: The detection result when the AI ​​threshold is 3.06, no obvious abnormal features are found.

Professor Yan Chen said, " The results of this study provide strong evidence for AI screening, indicating that AI's analysis of mammograms is equivalent to that of human doctors ."

Breast Cancer: The Hidden Pink Killer

On World Cancer Day on February 4, 2021, the International Agency for Research on Cancer under the World Health Organization (WHO) stated that there were 2.3 million new cases of breast cancer last year, accounting for 11.7%, exceeding the number of new cases of lung cancer for the first time He has become a "hidden pink killer".

At the same time, the group with the highest incidence of breast cancer is women in high-income countries, and the incidence rate among women in low- and middle-income countries is significantly lower. Moreover, about 0.5-1% of breast cancers come from men.

However, the mortality rate of breast cancer itself is not high. From 2016 to 2020, 8 million women were diagnosed with breast cancer and survived, which is higher than other cancers.

Currently, WHO is promoting the Global Breast Cancer Action around the world, hoping to reduce the number of deaths from breast cancer worldwide through early detection, timely diagnosis and comprehensive breast cancer management.
Insert image description here

Figure 7: AI-assisted breast cancer screening

As a powerful tool for primary screening of breast cancer, AI can detect the early characteristics of breast cancer in time and is expected to kill the "pink killer" in the preliminary stage. But it may be too early to promote AI in clinical settings on a large scale, because changes in the environment and the algorithm itself will continue to affect it, causing the sensitivity and specificity of AI to decline over time.

Professor Yan Chen also believes that " once AI enters clinical application, we must have a mechanism to continuously evaluate and monitor it ." Now, research teams from all over the world are evaluating the detection results of AI and have achieved satisfactory results. In the future, with the help of efficient AI and complete regulatory mechanisms , various diseases will have "nowhere to hide" and our health will be more stably protected.

Reference links:

[1]https://acsjournals.onlinelibrary.wiley.com/doi/10.3322/caac.21708

[2]https://www.sciencedirect.com/science/article/pii/S2667005422000047

This article was first published on the HyperAI Super Neural WeChat public platform~

Guess you like

Origin blog.csdn.net/HyperAI/article/details/133166546