Nature Methods | Evaluation of CAMI2 Metagenomic Analysis Methods

c76135778d8fa6e9044604e4fcd13d4f.png

https://doi.org/10.1038/s41592-022-01431-4

Evaluation of CAMI2 Metagenomic Analysis Methods

Over the past 20 years, the development of metagenomics has greatly increased our understanding of the human and environmental microbiome and has facilitated the development of related data analysis techniques. Nowadays, methods for analyzing metagenomic data emerge in an endless stream, which requires us to conduct a fair and comprehensive evaluation of these methods, so that we can choose and design the best analysis process for our data, and get the most realistic research conclusions. CAMI (Critical Assessment of Metagenome Interpretation) is a large-scale collaborative research project initiated to meet the above needs. At present, the second round of the CAMI challenge project, CAMI 2, has ended. Hundreds of scientists in the field participated in the evaluation and analysis of long-read and short-read complex macros from different environments (ocean, roots, multi-strain mixture, clinical). Methods for genomic datasets, including dozens of software for metagenomic assembly, binning, sequence classification, species abundance prediction, and pathogenic microorganism identification. These metagenomic datasets were created based on approximately 1700 new and known microbial genomes and 600 new plasmids and viruses. A total of 5002 analysis results from 76 software were analyzed in CAMI 2.

Compared with the software evaluated in the first challenge, the performance of the assembly software was improved by up to 30%, and overall the phenotype of HipMer, GATB in the short-read-based assembly software was superior (Figure 1). However, in the presence of multiple closely related strains, assembly continuity, genome integrity, and strain recall decreased. This suggests that most assembly software, sometimes intentionally, does not address strain-level assembly, resulting in more fragmented and strain-specific assemblies. In addition, genome coverage, parameter settings, and data preprocessing affected the assembly quality, while software performance was similar across versions. Most of the submitted metagenomic assemblies used only short reads, and the overall quality of long reads and mixed assemblies was not high. However, mixed assemblies for difficult-to-assemble regions, such as 16S rRNA genes, are more complete than most short-read assemblies. Hybrid assembly software was also less affected by closely related strains in the sample, suggesting that long reads help distinguish strains.

2bacdfc0adc565884038f3649f20b4a2.png

Figure 1: Metagenome assembly software performance. a, Genome integrity. b, Mismatches per 100 kb. c, Incorrect assembly. d, NGA50. e, Strain recall. f, Strain accuracy. Lines represent different subsets of the genome analyzed, and GSA (gold standard assembly) values ​​represent upper bounds for each metric. Blue represents unique genomes (ANI < 95% of the closest genome), green represents common genomes (ANI ≥ 95% of the closest genome). Strain recall = assembled high quality genomes / total genomes, Strain precision = assembled high quality genomes / assembled genomes. 

Compared with most single binning software, the method of aggregating multiple binning software (such as MetaBinner, UltraBinner, MetaWRAP) has a huge improvement in various indicators. The performance of the single boxing software CONCOCT is also quite good. In general, genome binning software exhibits different performances across different metrics and dataset types, with the high diversity of strains and low assembly quality posing great challenges and greatly degrading performance. In the plant-associated microbiome dataset, the plant host and 55 fungal genomes had sufficient coverage, so high-quality binning results were obtained.

Among the sequence classification software, MEGAN and Kraken have the best combined performance. Many abundance prediction software such as mOTUs and MetaPhlAn performed well in the first round of the CAMI challenge, and they also performed well in this evaluation (Fig. 2). The classification performance above the genus level is very good, while at the species level these Software performance drops significantly, while also not performing well against archaea and viruses. In the Clinical Pathogen Challenge, several submissions identified the causative pathogen accurately. However, none of these results were reproducible by hand, suggesting that these methods still need substantial improvement. Despite the great potential of clinical metagenomics for pathogen diagnosis and characterization, a variety of challenges still hinder its application in routine diagnostics.

5d2ea6a4c3065b26d8a969c1bc5992af.png

Figure 2: Analysis results of marine and multi-strain mixed datasets at the genus level a,b, marine datasets. c,d, Strain mixture dataset.

In the second challenge assessment, CAMI presents and dissects major advances in common metagenomic analysis software as well as current challenges. As methods and data generation continue to evolve, it will be important to continually re-evaluate these issues. We encourage every researcher interested in benchmarking, method evaluation in microbiome research to join the CAMI challenge to help microbiome researchers design the best analytical pipeline for their data and scientific questions.

Fernando Meyer, Adrian Fritz, Zhi-Luo Deng, David Koslicki, Till Robin Lesker, Alexey Gurevich, Gary Robertson, et al. 2022. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nature Methods 19: 429-440. https://doi.org/10.1038/s41592-022-01431-4

you may also like

10000+: Microflora Analysis  Baby and Cats and Dogs  Syphilis Rhapsody Extracts DNA Issues Nature  Cell Special Issue  Gut Commands Brain

Tutorial Series: Introduction to the Microbiome Biostar Microbiome  Metagenomics

Professional skills: Indispensable people for academic charts , high-  scoring articles  , and students' letter collections

Read the article: The evolutionary tree of metagenomic parasite benefits

Required Skill: Ask Question Search  Endnote

Literature Reading Enthusiastic SemanticScholar Geenmedical

Amplicon Analysis: Graph Interpretation Analysis Process Statistical Plotting

16S function prediction   PICRUSt  FAPROTAX  Bugbase Tax4Fun

Online Tool: 16S Predictive Media Bio-Information Mapping

Scientific research experience: cloud note  cloud collaboration public account

Programming Template:  Shell  R Perl

Biological science:   Gut bacteria  , life on the human body, the  great leap of life    

write on the back

In order to encourage readers to communicate and quickly solve scientific research difficulties, we have established a "metagenomics" professional discussion group, and currently there are more than 5000 front-line researchers at home and abroad to join. Participate in the discussion and get professional answers. Welcome to share this article to the circle of friends, and scan the code to add the editor-in-chief friend to bring you into the group. Be sure to note "name-unit-research direction-title/grade". PI, please indicate your identity, and there are also PI groups related to microorganisms at home and abroad for cooperation and exchanges. For help with technical problems, first read "How to Ask Questions Elegantly" to learn how to solve the problem.

12c60e2b4993862c7e4991f252224a34.png

Learn 16S amplicons, metagenomics scientific research ideas and analysis practice, and pay attention to "metagenomics"

Click to read the original text, jump to the latest article directory to read

Guess you like

Origin blog.csdn.net/woodcorpse/article/details/124395606