Paper intensive reading (二十一):Correcting for batch effects in case-control microbiome studies

论文题目:Correcting for batch effects in case-control microbiome studies

scholar 引用:23

页数:18

发表时间:April 23, 2018

发表刊物:PLOS Computational Biology

作者:Sean M. Gibbons1,2,3, Claire Duvallet1,2, Eric J. Alm1,2,3*

Cambridge

摘要:

High-throughput data generation platforms, like mass-spectrometry, microarrays, and second-generation sequencing are susceptible to batch effects due to run-to-run variation in reagents, equipment, protocols, or personnel. Currently, batch correction methods are not commonly applied to microbiome sequencing datasets. In this paper, we compare different batch-correction methods applied to microbiome case-control studies. We introduce a model-free normalization procedure where features (i.e. bacterial taxa) in case samples are converted to percentiles of the equivalent features in control samples within a study prior to pooling data across studies. We look at how this percentile-normalization method compares to traditional meta-analysis methods for combining independent p-values and to limma and ComBat, widely used batch-correction models developed for RNA microarray data. Overall, we show that percentile-normalization is a simple, non-parametric approach for correcting batch effects and improving sensitivity in case-control meta-analyses.

Author summary

Batch effects are obstacles to comparing results across studies. Traditional meta-analysis techniques for combining p-values from independent studies, like Fisher’s method, are effective but statistically conservative. If batch-effects can be corrected, then statistical tests can be performed on data pooled across studies, increasing sensitivity to detect dif- ferences between treatment groups. Here, we show how a simple, model-free approach corrects for batch effects in case-control microbiome datasets.

结论:

  • these tools are not as effective when batch effects are confounded with biological signals or when parametric assumptions do not apply, which is often the case in microbiome case-control studies.
  • Our percentile-nor- malization approach was much more effective than limma and ComBat in controlling false positives (Fig 4), especially in the presence of low-abundance taxa (S1 Fig). 特别适用于低丰度分类单元的情况
  • The main conditions for applying this method are that 1) each batch must have a sizeable number of con- trol samples (i.e. the density of the control distribution limits the resolution of the percentile- transformation of the case samples), and 2) case and control populations should be consis- tently defined across batches (i.e. same definition of ‘healthy’ or ‘diseased’ groups). 
  • We suggest that methods like limma and ComBat are useful for studies lacking case and control groups. However, when studies have consistently defined internal controls, percentile-normalization should be the preferred batch correction approach. 

Introduction:

  • 已有方法:SVA 、limma、ComBat
  • All of these models are most effective when batch effects are not conflated with the true biological effects [1]. Furthermore, most batch correction methods make certain parametric assumptions.
  • In microbiome studies, batch effects are often diffuse and conflated with biological signals  微生物组研究的特点
  • 微生物组研究常用方法:One way to get around this issue is to calculate statistics within a given batch, and then compare significant features across batches using classic meta-analysis techniques for combining p-values, like Fisher’s and Stouffer’s methods 
  • These meta-analysis techniques are robust to batch effects across independent studies, but have less statistical power and ability to detect subtle differences than directly pooling data across studies.这些方法的缺点
  • we describe a model-free data-normalization procedure for controlling batch effects in case-control microbiome studies that enables pooling data across studies.
  •  

正文组织架构:

1. Introduction

2. Methods

2.1 Datasets

2.2 Sequence data processing

2.3 Percentile normalization

2.4 Combat

2.5 limma

2.6 Statistical analysis

2.7 in silico experiments

3. Results

3.1 Batch effects at OTU-level resolution

3.2 Batch effects at genus-level resolution across multiple diseases 

4. Discussion

5. Support information

正文部分内容摘录:

发布了273 篇原创文章 · 获赞 16 · 访问量 2万+

猜你喜欢

转载自blog.csdn.net/wxw060709/article/details/104180718