Paper intensive reading (二十一)：Correcting for batch effects in case-control microbiome studies

论文题目：Correcting for batch effects in case-control microbiome studies

scholar 引用：23

页数：18

发表时间：April 23, 2018

发表刊物：PLOS Computational Biology

作者：Sean M. Gibbons1,2,3, Claire Duvallet1,2, Eric J. Alm1,2,3*

Cambridge

摘要：

High-throughput data generation platforms, like mass-spectrometry, microarrays, and second-generation sequencing are susceptible to batch effects due to run-to-run variation in reagents, equipment, protocols, or personnel. Currently, batch correction methods are not commonly applied to microbiome sequencing datasets. In this paper, we compare different batch-correction methods applied to microbiome case-control studies. We introduce a model-free normalization procedure where features (i.e. bacterial taxa) in case samples are converted to percentiles of the equivalent features in control samples within a study prior to pooling data across studies. We look at how this percentile-normalization method compares to traditional meta-analysis methods for combining independent p-values and to limma and ComBat, widely used batch-correction models developed for RNA microarray data. Overall, we show that percentile-normalization is a simple, non-parametric approach for correcting batch effects and improving sensitivity in case-control meta-analyses.

Author summary

Batch effects are obstacles to comparing results across studies. Traditional meta-analysis techniques for combining p-values from independent studies, like Fisher’s method, are effective but statistically conservative. If batch-effects can be corrected, then statistical tests can be performed on data pooled across studies, increasing sensitivity to detect dif- ferences between treatment groups. Here, we show how a simple, model-free approach corrects for batch effects in case-control microbiome datasets.

结论：

these tools are not as effective when batch effects are confounded with biological signals or when parametric assumptions do not apply, which is often the case in microbiome case-control studies.
Our percentile-nor- malization approach was much more effective than limma and ComBat in controlling false positives (Fig 4), especially in the presence of low-abundance taxa (S1 Fig). 特别适用于低丰度分类单元的情况
The main conditions for applying this method are that 1) each batch must have a sizeable number of con- trol samples (i.e. the density of the control distribution limits the resolution of the percentile- transformation of the case samples), and 2) case and control populations should be consis- tently defined across batches (i.e. same definition of ‘healthy’ or ‘diseased’ groups).
We suggest that methods like limma and ComBat are useful for studies lacking case and control groups. However, when studies have consistently defined internal controls, percentile-normalization should be the preferred batch correction approach.

Introduction：

已有方法：SVA 、limma、ComBat
All of these models are most effective when batch effects are not conflated with the true biological effects [1]. Furthermore, most batch correction methods make certain parametric assumptions.
In microbiome studies, batch effects are often diffuse and conflated with biological signals 微生物组研究的特点
微生物组研究常用方法：One way to get around this issue is to calculate statistics within a given batch, and then compare significant features across batches using classic meta-analysis techniques for combining p-values, like Fisher’s and Stouffer’s methods
These meta-analysis techniques are robust to batch effects across independent studies, but have less statistical power and ability to detect subtle differences than directly pooling data across studies.这些方法的缺点
we describe a model-free data-normalization procedure for controlling batch effects in case-control microbiome studies that enables pooling data across studies.

正文组织架构：

1. Introduction

2. Methods

2.1 Datasets

2.2 Sequence data processing

2.3 Percentile normalization

2.4 Combat

2.5 limma

2.6 Statistical analysis

2.7 in silico experiments

3. Results

3.1 Batch effects at OTU-level resolution