Precision medicine, spatial omics, cell atlas, Tencent AI Lab uses deep learning to help life science research

Recently, three researches of Tencent AI Lab were selected into the top international academic journals Nature Methods and Nature Communications respectively , once again demonstrating the international leading technical strength in the frontier field of life sciences.

These three research results belong to the space omics technology in biological cell research , which is of great significance for promoting research in the fields of precision medicine, cell atlas drawing , and human life and health .

Cells are the basic unit of life, but current human understanding of cells is limited. The emergence and development of space omics technology allows us to obtain the gene expression patterns of cells in time and space dimensions and the interaction between cells , and understand the functions of organs and tissues from a high-precision molecular level , which is very important for understanding cells. Key information in biology, developmental biology, neurobiology, tumor biology, etc. is crucial to fill in the gaps in the study of position-function relationships at the tissue and organ level.

Space-omics technology is an upgrade of technologies such as high-throughput transcriptome sequencing and single-cell sequencing . By adding the important information dimension of "space" in cell analysis, researchers can understand The laws of operation of biological systems.

In recent years, through the introduction of artificial intelligence technologies such as deep learning, new breakthroughs have been continuously ushered in the field of space omics technology. The three achievements of Tencent AI lab this time are from three aspects: cell type annotation , microenvironment modeling and database . Key breakthroughs have been made, and the level of accuracy , data scale , and method innovation has exceeded industry standards, and it has promoted the development of international academic communities and related research.

they are, respectively:

  • A Spatial Transcriptome Cell Type Annotation Method Spatial-ID Based on Transfer Learning and Spatial Embedding 

  • A general method  SOTIP for microenvironment modeling using spatial omics data 

  • A spatial omics database with the largest scale (more than 50 million cells ) and the largest variety (26 types) in the industry

In terms of technologies related to space omics, the core direction and advantage of Tencent AI Lab lies in the research of AI algorithms. For a long time, Tencent AI Lab has cooperated extensively with well-known research institutes and hospitals in the industry to realize the implementation in life science research and clinical scenarios. Specific applications include the establishment of cell atlases, especially primate brain atlases, to facilitate research in brain science.

On the clinical front, Tencent AI Lab will use spatial omics to study the microenvironment and development trajectory of tumors to help promote targeted precision medicine.

Tencent has made many explorations in the fields of medical and life sciences. In 2022, Tencent AI Lab and Peking Union Medical College Hospital jointly released a portable intelligent surgical navigation system, and the initial clinical application was successful. In addition, the laboratory innovatively proposed the scBERT algorithm for single-cell annotation, and the results were published in the top international academic journal "Nature Machine Intelligence".

Nature Methods is a journal in the Nature series of journals that focuses on cutting-edge scientific research. It selects annual methods in the field of life sciences every year. The impact factor of the journal in 2022 is 47.99, ranking first in the field of biological research methods . Nature Communications is a comprehensive sub-journal of the Nature ("Nature") series of journals. It specializes in collecting high-quality research results from various fields of natural science. The impact factor of the journal in 2022 is 17.69, ranking third among comprehensive journals .

Three studies in detail

Study 1: A Spatial Transcriptome Cell Type Annotation Method Spatial-ID Based on Transfer Learning and Spatial Embedding

英文标题:Spatial-ID: a cell typing method for spatially resolved transcriptomics via transfer learning and spatial embedding.

Paper link : https://www.nature.com/articles/s41467-022-35288-0)

This study used the cell expression profile information of cell types in the single-cell transcriptome as reference knowledge, and used the algorithm of graph neural network to describe the spatial position relationship of cells in the spatial transcriptome, providing a rapid spatial transcriptome cell type annotation method.

Figure 1: Spatial-ID algorithm flow

Transfer learning transfers single-cell expression profile knowledge from existing single-cell transcriptome datasets. Spatial information embedding uses the possible interaction or co-expression patterns between cells and adjacent cells in the spatial context to improve the accuracy of cell type identification.

From the results, benchmark tests were carried out on four different spatial transcriptome public datasets (two mouse brain datasets, one mouse brain germ cell dataset, and one human non-small cell tumor dataset), compared with the existing 8 Performance comparison of SOTA methods (Seurat, SingleR, Scmap, Cell-ID, ScNym, SciBet, Tangram, Cell2location).

Spatial-ID achieved 92.75%, 87.74%, 60.45% and 69.76% accuracy on the four data sets, all of which were significantly better than the SOTA method; especially in the three-dimensional spatial transcriptome of the preoptic area of ​​the mouse hypothalamus On the dataset, compared with the best SOTA method, Spatial-ID's cell type annotation accuracy improved by about 6.5% on average.

Figure 2: Benchmarking results on a three-dimensional transcriptome dataset of the preoptic area of ​​the mouse hypothalamus

In addition, Spatial-ID provides a pipeline for new cell class discovery, discovering cell types not found in reference datasets.

According to the author of the paper, the cell type annotation based on the spatial transcriptome studies the specific expression of genes in a single cell and the spatial microenvironment of the cell, systematically classifies the cell types in the tissue, and uniformly describes each cell type The molecular features of these molecules and their location within tissues will change our understanding of biology and disease, and may lead to major breakthroughs in the way disease is diagnosed and treated.

In the future, the Spatial-ID spatial transcriptome cell annotation algorithm can perform cell type annotation for large-scale spatial transcriptome sequencing data, and is committed to building large-scale tissue cell maps, such as the spatial transcriptome cell map of the whole mouse brain/monkey brain.

Study 2: A general method SOTIP for microenvironment modeling using spatial omics data

English title : SOTIP is a versatile method for microenvironment modeling with spatial omics data

Paper link : https://www.nature.com/articles/s41467-022-34867-5

The study uses the optimal transport theory in the field of artificial intelligence, while combining the continuity of cells in physical space and state space, to model the microenvironment.

Its highlight is that by constructing the interrelationship network between the microenvironments, the low-dimensional manifold of the cell molecular expression profile is connected with the spatial local topological features, so as to achieve the simultaneous analysis of multiple important computing tasks, including the quantitative analysis of microenvironment heterogeneity. , Spatial domain identification and differential microenvironment analysis.

SOTIP has shown good accuracy, stability and robustness in testing various spatial transcriptome, proteome and metabolome data.

In terms of quantification of spatial heterogeneity, SOTIP accurately delineated the outline of tumor cell nuclear membrane and endoplasmic reticulum membrane at the subcellular level (AUC=0.85); at the tissue level, it identified the boundary between tumor and normal muscle tissue (Spearman coefficient = 0.847).

In terms of spatial domain identification, SOTIP has shown high accuracy in various spatial proteome and transcriptome data, accurately identifying different brain regions and tumor structures, and achieved an ARI of 0.58 in human brain region identification, which is better than BayesSpace, SpaGCN, STAGATE and other classic algorithms, and can be applied to three-dimensional spatial data.

In terms of differential microenvironment analysis, SOTIP used the identified microenvironment to discover two subtypes of triple-negative breast cancer with significantly different prognosis in a cohort of 34 patients (significant p value=9.2*10^-6) .

The two major application scenarios of SOTIP are brain science research and tumors.

A major challenge in brain science research is to study the interactions between different nerve cell types, between brain functional areas, and between nerve cells and functional areas. SOTIP can accurately identify different functional spatial domains in the brain without any manual intervention, and establishes a foundation for the construction of large-scale brain atlases.

The most important cell types in tumors are immune cells and tumor cells, and their spatial proximity and interactions constitute a complex tumor microenvironment. In terms of clinical diagnosis, the classification of many diseases cannot be completed by the ratio of the number of subtypes of immune cells and tumor cells. Environment, so as to carry out targeted treatment for patients.

Research 3: Currently the industry's largest (more than 50 million cells) and most diverse (26 types) spatial omics database

English title : SODB facilitates comprehensive exploration of spatial omics data

Paper link : https://www.nature.com/articles/s41592-023-01773-7

For biologists, new biological and pathological discoveries should be validated using different technologies and molecular omics to reduce false positives in scientific discoveries. Centralize and logically manage various spatial omics data, enabling researchers to quickly search, locate and acquire multimodal data as needed, so as to make full use of published data and avoid some unnecessary biological experiments.

From the perspective of bioinformaticians, SODB can support the benchmark data requirements of many computational methods, allowing method developers to focus on the computational model itself. 

The database provides the industry's largest (more than 50 million cells) and most diverse (26 spatial omics technologies) spatial omics data in the industry, and all data are processed into Anndata format (standard format of the spatial omics industry) by standard procedures . It also provides a variety of data analysis and new visualization modules to quickly visualize the entire organization and identify organizational areas.

In addition, the supporting Python toolkit pysodb is also provided, which can easily read data with only one line of code, and the time efficiency is 160 times higher than the traditional method (taking Slide-seq data as an example, the traditional method takes 19.04 minutes to read, pysodb only needs 7.16 seconds).

Currently, the data in this database is open to everyone:

Dataset: https://gene.ai.tencent.com/SpatialOmics/

Python package: https://github.com/TencentAILabHealthcare/pysodb

Reference link:

1. Shen, R., Liu, L., Wu, Z. et al. Spatial-ID: a cell typing method for spatially resolved transcriptomics via transfer learning and spatial embedding. Nat Commun 13, 7640 (2022). https://doi.org/10.1038/s41467-022-35288-0

2. Yuan, Z., Li, Y., Shi, M. et al. SOTIP is a versatile method for microenvironment modeling with spatial omics data. Nat Commun 13, 7330 (2022). https://doi.org/10.1038/s41467-022-34867-5

3. Yuan, Z., Pan, W., Zhao, X., et al. SODB facilitates comprehensive exploration of spatial omics data. Nat Methods (2023). https://doi.org/10.1038/s41592-023-01773-7

おすすめ

転載: blog.csdn.net/sinat_37574187/article/details/131265215