论文题目:Correcting for batch effects in case-control microbiome studies
scholar 引用:23
页数:18
发表时间:April 23, 2018
发表刊物:PLOS Computational Biology
作者:Sean M. Gibbons1,2,3, Claire Duvallet1,2, Eric J. Alm1,2,3*
Cambridge
摘要:
High-throughput data generation platforms, like mass-spectrometry, microarrays, and second-generation sequencing are susceptible to batch effects due to run-to-run variation in reagents, equipment, protocols, or personnel. Currently, batch correction methods are not commonly applied to microbiome sequencing datasets. In this paper, we compare different batch-correction methods applied to microbiome case-control studies. We introduce a model-free normalization procedure where features (i.e. bacterial taxa) in case samples are converted to percentiles of the equivalent features in control samples within a study prior to pooling data across studies. We look at how this percentile-normalization method compares to traditional meta-analysis methods for combining independent p-values and to limma and ComBat, widely used batch-correction models developed for RNA microarray data. Overall, we show that percentile-normalization is a simple, non-parametric approach for correcting batch effects and improving sensitivity in case-control meta-analyses.
Author summary
Batch effects are obstacles to comparing results across studies. Traditional meta-analysis techniques for combining p-values from independent studies, like Fisher’s method, are effective but statistically conservative. If batch-effects can be corrected, then statistical tests can be performed on data pooled across studies, increasing sensitivity to detect dif- ferences between treatment groups. Here, we show how a simple, model-free approach corrects for batch effects in case-control microbiome datasets.
结论:
- these tools are not as effective when batch effects are confounded with biological signals or when parametric assumptions do not apply, which is often the case in microbiome case-control studies.
- Our percentile-nor- malization approach was much more effective than limma and ComBat in controlling false positives (Fig 4), especially in the presence of low-abundance taxa (S1 Fig). 特别适用于低丰度分类单元的情况
- The main conditions for applying this method are that 1) each batch must have a sizeable number of con- trol samples (i.e. the density of the control distribution limits the resolution of the percentile- transformation of the case samples), and 2) case and control populations should be consis- tently defined across batches (i.e. same definition of ‘healthy’ or ‘diseased’ groups).
- We suggest that methods like limma and ComBat are useful for studies lacking case and control groups. However, when studies have consistently defined internal controls, percentile-normalization should be the preferred batch correction approach.
Introduction:
- 已有方法:SVA 、limma、ComBat
- All of these models are most effective when batch effects are not conflated with the true biological effects [1]. Furthermore, most batch correction methods make certain parametric assumptions.
- In microbiome studies, batch effects are often diffuse and conflated with biological signals 微生物组研究的特点
- 微生物组研究常用方法:One way to get around this issue is to calculate statistics within a given batch, and then compare significant features across batches using classic meta-analysis techniques for combining p-values, like Fisher’s and Stouffer’s methods
- These meta-analysis techniques are robust to batch effects across independent studies, but have less statistical power and ability to detect subtle differences than directly pooling data across studies.这些方法的缺点
- we describe a model-free data-normalization procedure for controlling batch effects in case-control microbiome studies that enables pooling data across studies.
正文组织架构:
1. Introduction
2. Methods
2.1 Datasets
2.2 Sequence data processing
2.3 Percentile normalization
2.4 Combat
2.5 limma
2.6 Statistical analysis
2.7 in silico experiments
3. Results
3.1 Batch effects at OTU-level resolution
3.2 Batch effects at genus-level resolution across multiple diseases
4. Discussion
5. Support information
正文部分内容摘录: