IF:21.1: Chinese scientists released the best practice of microbiome R language analysis! !

        In the era of high-throughput sequencing, amplicon sequencing and metagenomic sequencing technologies are commonly used in microbiome research to study the diversity, structure and function of microbial communities. A large amount of data information processing and visualization has become an urgent need for microbiome research, and the R language package software used for analysis is complex and has similar functions, making it difficult to choose, which has brought major challenges to many researchers exploring microbiome data.

        Yuan Jun Group of Nanjing Agricultural University & Liu Yongxin Group of Chinese Academy of Agricultural Sciences jointly summarized and introduced the data mining process of 324 commonly used R packages based on R language, and classified these R packages according to the six functional categories of microbiome research. Common content in microbiome data analysis, and the advantages and limitations of commonly used integrated R packages are also introduced in detail, and the most suitable analysis process for microbiome data mining is proposed.

The relevant code can be obtained from:

https://github.com/taowenmicro/EasyMicrobiomeR   You can use it at any time!

picture

        In this review, the authors first introduced the amplicon-sequencing-based workflow for microbial community data analysis (Panel A). Its core file is OTU cluster annotation information, including OTU table, classification table, sample metadata (Metadata), phylogenetic tree (Tree) and representative sequence (Rep.fa). First, raw data can be processed by using USEARCH/VSEARCH, QIIME2, DADA2 software packages. Then, save important files for downstream analysis in the R language environment under RStudio software. Many microbial analysis methods rely on R packages developed in the R language.

        However, the number of R packages for downstream analysis has reached dizzying levels. The font size in the word cloud diagram below represents the number of references to the R package (Figure B).

        The article sorts out a total of 88 commonly used R packages commonly used in data preprocessing and visualization (Figure C), as well as six types of R packages for microbial community analysis (Figure D) (including diversity analysis, difference analysis, marker Object recognition, correlation and network analysis, function prediction and other related analysis, etc.), and a detailed introduction of the advantages of corresponding software for different research contents.

picture

        In addition, the R package dedicated to microbial data processing is also included in the introduction system. The author introduces the six commonly used microbiome analysis integrated R packages in detail, and systematically sorts out the functions of these six commonly used analysis integrated packages. include:

  • Phyloseq package

  • Microbiome package

  • MicrobiomeAnalystR package

  • Microeco bag (highly recommended!)

  • amplicon package

picture

        Effective selection by microbiome researchers may be hindered by the plethora of R packages. Therefore, the author selected an efficient, commonly used, and user-friendly function package among the six analyses, covering multiple analytical contents in microbial research: 1) diversity analysis, 2) differential analysis, 3) biomarker identification, 4 ) correlation and network analysis, 5) functional prediction, 6) other microbiome analysis. The new R function integrates and summarizes most of the common analysis content in microbiome, and forms the most suitable path for microbiome analysis. Examples of practical results are as follows:

        Microbiome data analysis can be accelerated using appropriate data structures. The development of R language packages and the construction of integrated packages have continuously promoted the development of microbiome research and the deepening of data mining. This review systematically establishes the functions and advantages of the R language package, systematically evaluates redundant functional software, avoids reusing the same part or similar content, highlights the advantages of the R package, and is more conducive to data mining and machine building. Model analysis provides an important theoretical basis and practical reference for the development of better microbiome tools in the future.

Guess you like

Origin blog.csdn.net/SHANGHAILINGEN/article/details/131825465
Recommended