Paper intensive reading (十六):Deep learning enables accurate clustering and batch effect removal

论文题目:Deep learning enables accurate clustering and batch effect removal in single-cell RNA-seq analysis

scholar 引用:0

页数:14

发表时间:2019.1.25

发表刊物:preprint

作者:Xiangjie Li1,2, Yafei Lyu1, Jihwan Park3, Jingxiao Zhang2, Dwight Stambolian4, Katalin Susztak3,5  Gang Hu1,5*, Mingyao Li1*

University of Pennsylvania Perelman School of Medicine

摘要:

Single-cell RNA sequencing (scRNA-seq) can characterize cell types and states through unsupervised clustering, but the ever increasing number of cells imposes computational challenges. We present an unsupervised deep embedding algorithm for single-cell clustering (DESC) that iteratively learns cluster-specific gene expression signatures and cluster assignment. DESC significantly improves clustering accuracy across various datasets and is capable of removing complex batch effects while maintaining true biological variations.

这个准确来说不是一篇paper,是一个会议的report。

正文部分内容摘录:

  • An open-source implementation of the DESC algorithm can be downloaded from https://eleozzr.github.io/desc/.

  • ScRNA-seq clustering and batch effect removal are typically addressed through separate analyses. Commonly used approaches to remove batch effect include Seurat’s Canonical Correlation Analysis3 (CCA) or Mutual Nearest Neighbors (MNN) approach4. 在ScRNA-seq中常用的消除批次效应的方法:CCA和MNN

  • After removing batch effect, clustering analysis is performed to identify cell clusters using methods such as Louvain’s method5, Infomap6, graph-based clustering7, shared nearest neighbor8, or consensus clustering with SC39. 消除了批次效应以后用聚类方法

  • Since some cell types are more vulnerable to batch effect than others, batch effect removal should be performed jointly with clustering to achieve optimal performance. 批次效应有时候应该结合聚类方法来获取最佳效果

  • However, none of the existing methods are capable of simultaneously clustering cells and removing batch effect.目前,尚不存在这种方法

  • We developed DESC, an unsupervised deep learning algorithm that iteratively learns cluster-specific gene expression representation and cluster assignments for scRNA-seq data clustering (Fig. 1a). Using a deep neural network, DESC initializes clustering obtained from an autoencoder and learns a non-linear mapping function from the original scRNA-seq data space to a low-dimensional feature space by iteratively optimizing a clustering objective function. This iterative procedure moves each cell to its nearest cluster, balances biological and technical differences between clusters, and reduces the influence of batch effect. DESC also enables soft clustering by assigning cluster-specific probabilities to each cell, facilitating the clustering of cells with high-confidence. DESC的主要原理

  • We benchmarked DESC’s performance by analyzing the multi-tissue gene expression data in GTEx10. 评估算法性能的数据集,一个模拟数据集,(n=11,688)

  • adjusted rand index (ARI)

  • In summary, we have developed a deep learning algorithm that clusters scRNA-seq data by iteratively optimizing a clustering objective function with a self-training target distribution.

  • DESC’s memory usage and running time increase linearly with the number of cells, thus making it scalable to large datasets (Fig. 3e). DESC can further speed up computation by GPUs.

  • We analyzed a mouse brain dataset with 1.3 million cells generated by 10X, which only took about 3.5 hours with one NVIDIA TITAN Xp GPU (Supplementary Note 6).

  • Compared to existing scRNA-seq clustering methods DESC improves clustering by iteratively learning cluster-specific gene expression features from cells clustered with high confidence.

  • This iterative clustering also removes batch effect and maintains true biological differences between clusters.

  • As the growth of single-cell studies increases, DESC will be a more precise tool for clustering of large datasets.

发布了273 篇原创文章 · 获赞 16 · 访问量 2万+

猜你喜欢

转载自blog.csdn.net/wxw060709/article/details/104171862