Metagenomics analysis tool MetaWRAP 1.3.2 based on the conda environment is installed and used, and the basic sequence analysis process automatic analysis script

introduce:

 MetaWRAP is a metagenomics analysis tool library for analyzing metagenomic sequencing data. It provides a powerful set of tools for assembly, annotation, and functional analysis of metagenomic data.

MetaWRAP features include:

  1. Data quality control: including removing low-quality reads, removing contaminating sequences, and shearing adapter sequences.

  2. Genome assembly: MetaWRAP supports a variety of genome assembly algorithms, including SPAdes and MEGAHIT. Depending on the user's needs, different algorithms can be selected for assembly.

  3. Genome annotation: MetaWRAP can perform gene prediction, functional annotation, pathway prediction and other operations. It supports the use of multiple databases for annotation, including KEGG, COG, and NR, etc.

  4. Genome comparison: MetaWRAP can perform multi-genome comparison and species composition analysis. It helps users understand the similarities and differences between different samples.

  5. Niche analysis: MetaWRAP also provides some niche analysis tools that can help users understand the function and metabolic capabilities of microorganisms in samples.

MetaWRAP’s advantages include:

  1. Rich functions: MetaWRAP provides a variety of functions to help users go from raw sequencing data to final biological interpretation.

  2. Flexible usage: MetaWRAP supports two usage methods: command line and Python API. Users can choose the appropriate method according to their own needs.

  3. Efficient computing performance: MetaWRAP uses multi-threading and parallel computing to speed up analysis.

In summary, MetaWRAP is a powerful metagenomic analysis tool library that can help users assemble, annotate and functionally analyze metagenomic data. Its usage is flexible and its computing performance is efficient, making it suitable for various metagenomics research needs.

Read the article first:MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis | Microbiome | Full Text

github目录:https://github.com/ursky/metaWRAP

anaconda site: Login :: Anaconda.org

Install

Let’s introduce conda or mamba installation here. Others may not be the latest version, and sometimes it is troublesome to configure.

conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda config --add channels ursky

# 

mamba config --add channels defaults
mamba config --add channels conda-forge
mamba config --add channels bioconda
mamba config --add channels ursky

Pay attention when installing and put all channels directly, otherwise there will be a missing package error:

mamba create -y --name metawrap132 -c ursky -c bioconda -c conda-forge metawrap-mg=1.3.2

Final tips after installation:

Check the installation results, mainly look at the version of metawrap-mg, here is 1.3.2, from ursky:

mamba list

Configure advice database

Here are each database, database size, and databases that may be used by each module. They are configured according to actual needs. If the corresponding database is not configured, you need to specify or ignore the corresponding parameters in subsequent modules:

taxonomy database:

# 先删除原配置文件夹
rm -rf /miniconda3/envs/metawrap132/opt/krona/taxonomy
# 自己创建指定文件夹
mkdir /path/on/big/disk/taxonomy
# 创建软链接
ln -s /path/on/big/disk/taxonomy /miniconda3/envs/metawrap132/opt/krona/taxonomy
# 自动下载更新数据库,会自动下载到自己指定的文件夹
ktUpdateTaxonomy.sh

Directly view the contents of the ktUpdateTaxonomy.sh file and download directly from the ncbi database:

After the download and decompression is completed, a taxonomy.tab file is generated in the target directory:

head taxonomy.tab 
1	0	1	    no rank	        root
2	2	131567	superkingdom	Bacteria
6	7	335928	genus	        Azorhizobium
7	8	6	    species	        Azorhizobium caulinodans
9	8	32199	species	        Buchnera     aphidicola
10	7	1706371	genus	        Cellvibrio
11	9	1707	species	        Cellulomonas gilvus
13	7	203488	genus	        Dictyoglomus
14	8	13	    species	        Dictyoglomus thermophilum
16	7	32011	genus	        Methylophilus

GRIDSS\SILVA 16S rRNA\BUSCO database

quast-download-gridss
quast-download-silva
quast-download-busco

After downloading, it is located in the directory:

/miniconda3/envs/metawrap/lib/python2.7/site-packages/quast_libs/

Download log:

envs/metawrap/lib/python2.7/site-packages/quast_libs/silva/blastdb.log

The middle databases cannot be downloaded. Please change the links. It should be that the version has changed and the address is wrong:

busco’s data directory:Index of /v5/data/lineages/

busco的官网:BUSCO - from QC to gene prediction and phylogenomics 

Files that need to be downloaded: 

https://busco-data.ezlab.org/v5/data/lineages/fungi_odb10.2021-06-28.tar.gz

https://busco-data.ezlab.org/v5/data/lineages/eukaryota_odb10.2020-09-10.tar.gz

 https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2020-03-06.tar.gz

cd miniconda3/envs/metawrap132/lib/python2.7/site-packages/quast_libs/busco/

mv fungi_odb10.2021-06-28.tar.gz fungi.tar.gz
mv bacteria_odb10.2020-03-06.tar.gz bacteria.tar.gz
mv eukaryota_odb10.2020-09-10.tar.gz eukaryota.tar.gz

It should be decompressed automatically when you start the quast program later.

Mainstream database configuration

Anyone who likes the information on the original website can refer to the official website:

https://github.com/bxlab/metaWRAP/blob/master/installation/database_installation.md

CheckM

mkdir MY_CHECKM_FOLDER

# Now manually download the database:
cd MY_CHECKM_FOLDER
wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz
tar -xvf *.tar.gz
rm *.gz
cd ../

# Now you need to tell CheckM where to find this data befo

Guess you like

Origin blog.csdn.net/zrc_xiaoguo/article/details/134998348