DeepDRIM: A deep neural network for reconstructing cell type-specific gene regulatory networks using single-cell RNA-seq data

Summary

Single-cell RNA sequencing can capture gene activity at single-cell resolution, allowing the reconstruction of cell type-specific gene regulatory networks (GRNs). Available algorithms for reconstructing GRNs are usually designed for bulk RNA-seq data, and few algorithms are suitable for analyzing scRNA-seq data by dealing with missing events and cellular heterogeneity. In this paper, the authors represent the joint gene expression distributions of gene pairs as images, and propose a novel supervised deep neural network called DeepDRIM , which utilizes images of target TF gene pairs and images of potential neighbors to reconstruct GRN scRNA- seq data . Due to considering the neighborhood context of TF-gene pairs, DeepDRIM can effectively eliminate false positives caused by transitive gene-gene interactions . The authors compared DeepDRIM with nine GRN reconstruction algorithms designed for batch or single-cell RNA-seq data. It achieved significantly better performance on scRNA-seq data collected from eight cell lines. Simulation data shows that DeepDRIM is robust to dropout rate, cell number, and training data size. The authors focused on cell type-specific GRN alterations and observed differential TF targets.

introduce

The reconstruction of gene regulatory networks (GRNs) is critical for understanding the mechanisms of synergistic gene effects and context-specific transcriptional dynamics. But cellular heterogeneity has previously been overlooked . Single-cell RNA sequencing (scRNA-seq) offers the opportunity to capture cell-specific gene expression, allowing greater insight into cellular heterogeneity and cell-type-specific gene activity.
But most available algorithms for GRN reconstruction are designed for batch gene expression and function by solving two computational challenges. In this case, unique difficulties arise if scRNA-seq data are employed. First, putative TF-gene interactions were derived by examining their co-expression. Large amounts of gene expression data are usually normalized to a standard Gaussian distribution so that TF gene correlations can be quantified by methods such as mutual information (MI), Pearson correlation coefficient (PCC), and others. scRNA-seq gene expression data are zero due to imbalanced transcript sampling. Although it is possible to imput zero entries before calculating TFgene co-expression, this may introduce unpredictable noise and bias since most imputation algorithms utilize gene-gene co-expression. Second, TF gene pairs with strong coexpression due to transitive interactions (such as those bridged by one or more intermediate genes) should be eliminated. Several strategies have been devised to remove these transitive interactions by modulating other confounding genes; examples include Gaussian graphical models, conditional MI, context-based normalization and edge removal, and tree-based ensemble methods. These algorithms were originally developed to analyze large amounts of gene expression data and are not suitable for modeling scRNA-seq data. Some algorithmic reconstructions have been proposed.
SCODE , PIDC , SINCERITIES , GENIE3. Although these specialized strategies were designed to address inherent issues in scRNA-seq data, none of them yielded acceptable results benchmarked against cell-type-specific ChIP-seq data, and some even approached random guesswork.
CNNC is a supervised deep neural network that represents the joint expression of gene pairs as an image and predicts gene-gene co-expression from scRNA-seq data using a convolutional neural network (CNN). We noticed that a large number of false positives obtained by CNNC were concentrated in gene pairs with strong Pearson correlations (see figure).

Insert image description here
The authors propose DeepDRIM (Deep Learning-Based Direct Regulatory Interaction Model), a supervised deep neural network that can reconstruct highly accurate cell type-specific GRNs from scRNA-seq data by considering the main image and neighboring images. The basic principle and workflow of DeepDRIM are shown in Figure 2.
Insert image description here
Insert image description here

result

Effectiveness of adjacent images in eliminating transitive interactions

First generate simulated data and try to train the CNNC using two types of inputs, one containing only the main image and the other containing the augmented image. It can be found that when the neighboring images in the model are considered, the overall proportion of false positives and false positives due to transfer interactions drops significantly by 40.4% and 55.4% (Fig. 1B). Insert image description here
The rationale behind this observation can be viewed as “normalizing” the master image to its neighborhood in order to mitigate overestimation of interaction strength. Furthermore, Figures 1C and D clearly show that considering neighboring images does not impair the ability to predict direct interactions (e.g. gene 1 ⇒ gene 2 in Figure 1C, gene 1 ⇒ gene 3 in Figure 1D). In Figure 1E, gene 2 is connected to gene 3 via the indirect edge gene 2 ⇒ gene 4 ⇒ gene 3. Furthermore, we note that the correlation of {Gene 2, Gene 4} (|PCC| = 0.81) and {Gene 4, Gene 3} (|PCC| = 0.83) is stronger than that of the target {Gene 2, Gene 3} (|PCC | = 0.67), which provides clear evidence that {gene 2, gene 3} should be flagged as a false positive. By considering neighboring images, the model reduced the prediction confidence score for {Gene 2, Gene 3} from 0.672 to 0.001, the same as observed in Figure 1F. These findings solidify the importance of considering local neighborhoods in GRN construction to eliminate false positives due to transitive interactions.
Insert image description here

DeepDRIM Overview

DeepDRIM is proposed to reconstruct cell type-specific GRNs from scRNA-seq data with high accuracy and low false positive rate. Figure 2 illustrates how DeepDRIM can be used to predict the interaction between gene a and gene b. First, DeepDRIM transformed the joint gene expression of gene a and gene b into a, a two-dimensional histogram with 32 x 32 bins (main image, Figure 2A), where the intensity of each bin refers to the number of cells falling into it . Second, DeepDRIM constructs 2n + 2 neighboring images, where 2n images involving n genes have top positive covariance with gene a (a, i) or gene b (b, j), and these two images represent the self-image (a, a) and (b, b).
These neighboring images were fed to the model to capture the neighboring context of the main image (Fig. 2B), which provided the key information needed to distinguish direct from delivered interactions. We organize adjacent images into tensors instead of augmented images to achieve better performance on real data (Supplementary Figure S2). Third, two CNNs are used to process the main image (Network A) and neighboring image tensors (32 x 32 x 2n+2) (Network B) respectively (Fig. 2C, Methods and Supplementary Fig. S3). Network A follows VGGnet [32] and is similar to CNNC. Network B is a Siamese-like neural network designed to process multiple adjacent images. The neural network was trained on known TF gene interactions taken from publicly available cell type-specific ChIP-seq data. Finally, unknown interactions are predicted by directed edges with confidence scores (between 0 and 1, Figure 2D).

DeepDRIM outperforms existing algorithms for reconstructing cell type-specific GRNs

This paper collected scRNA-seq data sets from eight cell lines and their corresponding ChIP-seq data from two sources to compare DeepDRIM with existing methods (Table 1) using TF-aware three-fold cross-validation ( method). We first evaluate DeepDRIM using PCC, MI, CNNC, and GENIE3; GENIE3 is one of the best algorithms for GRN reconstruction on scRNA-seq and bulk gene expression data.Insert image description here
Insert image description here

DeepDRIM is robust to scRNA-seq data quality and training set size

The performance of DeepDRIM may be affected by scRNA-seq data quality (dropout rate and cell number), the number of neighboring images involved, and the training set size. To evaluate the robustness of DeepDRIM to these factors, this paper first selects scRNA-seq data from bone marrow-derived macrophages as a template, and uses a series of parameters to simulate a series of scRNA-seq data. Seven scRNAseq gene expression datasets were generated by subsampling the number of cells involved (from 20 to 4000 cells), which in turn varied the resolution of the main and adjacent images. The authors found that DeepDRIM is robust to low-resolution images when the number of cells is greater than 100 (Figure 4A). Next, the authors use MAGIC to estimate dropout in the template and then randomly mask entries into dropouts with a range of dropout rates (Method). DeepDRIM exhibits stable performance in different pressure differential configurations. Third, the authors compared the performance of DeepDRIM by varying the number of adjacent images input into the model. As a result, the authors found that the more adjacent images involved, the better the performance of DeepDRIM (Figure 4C). In practice, the computational cost will be higher when more images are involved. In this study, the authors selected the top 10 genes with the strongest positive covariance with the target TF or gene, thus involving a total of 22 adjacent images (if not specified) to balance these two factors. Furthermore, to evaluate the effect of training set size, we subsampled 20%, 40%, 60%, 80%, and 100% of the benchmark TF gene pairs for training. Our results show that the size of the training set does not significantly affect the performance of DeepDRIM (Fig. 4D), and almost reaches a plateau when 40% of the training set (including 20 101 TF gene pairs) is applied.
Insert image description here

in conclusion

DeepDRIM, a supervised deep neural network model for predicting GRN from scRNA-seq data. DeepDRIM converts the joint expression of TF-gene pairs into the main image and treats adjacent images as the neighborhood context of the main image to eliminate false positives due to transitive interactions. DeepDRIM also leverages the training set to capture key regions in CNN embeddings that can identify TF gene interactions and causal relationships. Our results show that DeepDRIM outperforms nine existing algorithms on the eight cell types tested and is robust to the quality of scRNA-seq data.

method

1 Representation of joint expression of gene pairs
2 Network structure of DeepDRIM
3 Simulating scRNA-seq data to examine the influence of neighbor images
4 scRNA-seq data from eight cell lines
5 Comparison of DeepDRIM with existing GRN reconstruction algorithms
6 Simulating scRNA-seq data to assess robustness

DeepDRIM is available at https://github.com/jiaxchen2-c/DeepDRIM.

Guess you like

Origin blog.csdn.net/weixin_56845253/article/details/131679903