Google DeepMind is back on Science: Using AI to predict the pathogenicity of genetic mutations, PK human experts = 89: 0.1 | Open source

Yuyang comes from Aofeisi
Qubit | Public account QbitAI

How genetic mutations affect human health remains largely a mystery.

But now, humans can use the power of AI to study this matter:

Based on AlphaFold, Google DeepMind trained AlphaMissense specifically to predict the pathogenicity of missense mutations in the human genome .

The paper was published in Science today.

9a114c9af4c1cd9d78b454ea2567732b.png

A "missense mutation" is a non-synonymous substitution in a DNA sequence. To put it simply, the original letters (base pairs) of DNA are replaced.

This means that the amino acids in the corresponding produced proteins will also undergo dissimilation, which may cause the protein to lose its original function and lead to disease.

AlphaMissense’s first step was to classify all 71 million possible missense mutations.

As a result, the AI ​​successfully classified 89% of these variants as "likely pathogenic" or "likely benign." In comparison, human experts currently achieve 0.1%.

31d9679fea938e0bfe712f5c21a898e1.png

Using AI to predict genetic mutation pathogenicity

In one sentence, AlphaMissense’s main ability is to predict whether all possible missense mutations in the human genome are pathogenic or benign.

How to do it--

AlphaMissense is built based on DeepMind’s protein structure prediction model AlphaFold.

The researchers fine-tuned AlphaFold using databases of human and primate mutation frequencies. Specifically, common variants in nature can be regarded as harmless variants, while variants that have never appeared in the database can be regarded as "pathogenic variants" training data.

This training strategy can avoid bias caused by manual annotation.

aa41a76b1f5c66433d96b6f84eb29214.png

It is worth mentioning that AlphaMissense cannot predict changes in protein structure after mutations and other effects of mutations on protein stability.

After inputting a missense mutation, AlphaMissense will combine the protein structure context and the protein language model to give the mutation a score of 0-1 to roughly determine whether the mutation will cause disease.

2dd1bc18a0ce14f285e8792698d1d945.png
AlphaMissense+AlphaFold effect

So the question is, is AlphaMissense’s classification really reliable?

The researchers verified it experimentally.

ab8dfd3fbd9e6175ca1254b659d97510.png

On ClinVar, the authoritative genetics database, AlphaMissense has demonstrated more powerful classification performance than other computing methods.

Among the 18,924 variant data, the area under the ROC curve (auROC) of AlphaMissense reached 0.94. The closer this number is to 1, the more accurately the model can distinguish between positive and negative samples.

It is worth noting that in the above figure, the calculation method shown in gray is trained on ClinVar, and there may be overfitting.

In terms of prediction accuracy, AlphaMissense also reached SOTA. By adjusting the classification threshold, AlphaMissense can classify "possibly pathogenic" and "possibly benign" with an expected accuracy of 90%.

3fa0371fd2a8be258bdaf090c162311b.png

DeepMind said:

We look forward to seeing AlphaMissense help solve unanswered questions in genomics and biological sciences.

To this end, they have made AlphaMissense’s prediction results and model code open source.

In addition, DeepMind also shared a prediction data set of all possible 216 million single amino acid sequence substitutions in more than 19,000 human proteins.

Reference links:
[1] Paper address: https://www.science.org/doi/10.1126/science.adg7492
[2] https://www.deepmind.com/blog/alphamissense-catalogue-of-genetic-mutations -to-help-pinpoint-the-cause-of-diseases
[3]https://github.com/deepmind/alphamissense

Guess you like

Origin blog.csdn.net/QbitAI/article/details/133108418