来自bioBakery Lab的宏基因组学微生物群落的代谢功能分析工具-HUMAnN 3.0的安装配置及分析使用方法-安装填坑

HUMAnN 3.0 简介：

HUMAnN 3.0 是一个用于宏基因组数据分析的工具，能够从宏基因组测序数据中推断出微生物群落的代谢功能信息。它可以识别微生物群落中存在的代谢途径，并定量这些通路的丰度。HUMAnN 3.0 依赖于多个工具和数据库来实现这些功能，其中包括 MetaPhlAn 3、DIAMOND、UniRef90 等。

原网站：humann3 – The Huttenhower Lab (harvard.edu)

仓库地址：github.com

HUMAnN 3.0 安装步骤：

通过conda或mamba安装

1. 创建并激活一个新的环境（可选步骤）

首先，您可以创建一个新的环境来安装 HUMAnN 3。在这个环境中，您可以独立管理 HUMAnN 3 及其依赖项。

# 因通过conda安装humann会默认配置MetaPhlAn，所以这里环境名称就使用bioBakery了
conda create --name biobakery3 python=3.7
# 或
mamba create --name biobakery3 python=3.7

接下来，激活新创建的环境：

conda activate biobakery3
# 或
mamba activate biobakery3

设置conda chanel

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --add channels biobakery

2. 安装 HUMAnN 3

现在，您可以使用 mamba 来安装 HUMAnN 3。请注意，HUMAnN 3 是作为 Python 包发布的，因此您可以直接通过 pip 或 mamba 安装。

conda install -c bioconda humann
#
mamba install -c bioconda humann

这将从 bioconda 频道安装 HUMAnN 3 及其相关依赖项。

根据报错手动安装依赖环境：

#报缺少bowtie2
mamba install bowtie2 -c bioconda

# 报缺少diamond
mamba install diamond -c bioconda

使用mamba或conda安装humann3我这里安装失败了，尝试了通过bioconda安装依赖，MetaPhlAn还是安装不上，所以最终使用pypi方法安装成功，目前为3.8版本

使用以下代码从 PyPI 安装 HUMAnN 3.0：

# 官方建议方法：
pip install humann --no-binary :all:

###自动下载humann包然后配置解压就行了，我这里安装成功

通过pip这样安装后会出现找不到MetaPhlAn的错误，所以还得自己再配置安装，不然后面运行的时候会出错：

CRITICAL ERROR: The metaphlan executable can not be found. Please check the install.

其实这个就是安装不完全的原因，在前面mamba或者conda设置chanels时没有生效，下面是正确的安装方式：

### 将所有需要的chanels全部加入，这样依赖才能解析完全。
mamba install humann -c biobakery -c bioconda -c conda-forge

##### 真是醉了，连自己的bioBakery没有独立配置完整依赖，这个坑真的好大！！！！

3. 下载数据库

HUMAnN 3.0 使用了多个数据库，需要下载这些数据库文件：

先查看可用的数据库：

humann_databases --available

HUMAnN Databases ( database : build = location )
chocophlan : full = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v201901_v31.tar.gz
chocophlan : DEMO = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/DEMO_chocophlan.v201901_v31.tar.gz
uniref : uniref50_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref50_annotated_v201901b_full.tar.gz
uniref : uniref90_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_annotated_v201901b_full.tar.gz
uniref : uniref50_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref50_ec_filtered_201901b_subset.tar.gz
uniref : uniref90_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref90_ec_filtered_201901b_subset.tar.gz
uniref : DEMO_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_DEMO_diamond_v201901b.tar.gz
utility_mapping : full = http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz

### 为啥还是2019呢？ 停止更新了？？？？

下载指定数据库：

humann_databases --download chocophlan full $DIR_TO_STORE_DB humann_databases --download uniref uniref90_diamond $DIR_TO_STORE_DB
# 其中 $DIR_TO_STORE_DB 是你希望存储数据库文件的路径。

humann_databases --download chocophlan full /path/to/databases --update-config yes

humann_databases --download uniref uniref90_diamond /path/to/databases --update-config yes

humann_databases --download utility_mapping full /path/to/databases --update-config yes

手动下载数据库，可用链接直接使用前上面的humann_databases中分别对应的链接，并解压到指定文件夹：

wget -c http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v201901_v31.tar.gz
wget -c http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_annotated_v201901b_full.tar.gz
wget -c http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz

mkdir chocophlan_v296_201901
mkdir uniref90_v201901
mkdir mapping_v201901

tar -zxvf full_chocophlan.v296_201901.tar.gz -C ./chocophlan_v296_201901/
tar -zxvf uniref90_annotated_v201901.tar.gz -C uniref90_v201901
tar -zxvf full_mapping_v201901.tar.gz -C ./mapping_v201901/

数据库设置，先查看已有设置情况：

# 查看已有数据库
humann_databases --list
### 命令不对。。。。。。。。。。

### 还是直接查看数据目录吧。
# 默认数据库目录，当然前面如果自己有设定的话看已设定目录
/miniconda3/envs/biobakery3/lib/python3.7/site-packages/humann

### 应该是 humann_config
humann_config

HUMAnN Configuration ( Section : Name = Value )
database_folders : nucleotide = /path/to/databases/chocophlan_v296_201901
database_folders : protein = /path/to/databases/uniref90_v201901/
database_folders : utility_mapping = /path/to/databases/mapping_v201901/
run_modes : resume = False
run_modes : verbose = False
run_modes : bypass_prescreen = False
run_modes : bypass_nucleotide_index = False
run_modes : bypass_nucleotide_search = False
run_modes : bypass_translated_search = False
run_modes : threads = 1
alignment_settings : evalue_threshold = 1.0
alignment_settings : prescreen_threshold = 0.01
alignment_settings : translated_subject_coverage_threshold = 50.0
alignment_settings : translated_query_coverage_threshold = 90.0
alignment_settings : nucleotide_subject_coverage_threshold = 50.0
alignment_settings : nucleotide_query_coverage_threshold = 90.0
output_format : output_max_decimals = 10
output_format : remove_stratified_output = False
output_format : remove_column_description_output = False

############################################################
humann_config --help
usage: humann_config [-h] [--print] [--update <section> <name> <value>]

HUMAnN Configuration

optional arguments:
  -h, --help            show this help message and exit
  --print               print the configuration
  --update <section> <name> <value>
                        update the section : name to the value provided

已准备好的数据库切换设置

## 更新格式：humann_config --update <section> <name> <value>
humann_config --update database_folders nucleotide /path/to/databases/chocophlan_v296_201901
humann_config --update database_folders protein /path/to/databases/uniref90_v201901/
humann_config --update database_folders utility_mapping /path/to/databases/mapping_v201901/

## 更新后查看设置
humann_config

# 还可以自己设置其他默认设置
# 比如说我的服务器都是30个线程以上，所以我将默认的运行线程数为30，这个根据自己服务器设置就行
humann_config --update run_modes threads 30
#######################
#　HUMAnN configuration file updated: run_modes : threads = 30

运行 HUMAnN 3.0

全参数帮助内容查看：

usage: humann_config [-h] [--print] [--update <section> <name> <value>]

HUMAnN Configuration

optional arguments:
  -h, --help            show this help message and exit
  --print               print the configuration
  --update <section> <name> <value>
                        update the section : name to the value provided
(biobakery3) [root@mgmt ~]# humann --help
usage: humann [-h] -i <input.fastq> -o <output> [--threads <1>] [--version]
              [-r] [--bypass-nucleotide-index] [--bypass-nucleotide-search]
              [--bypass-prescreen] [--bypass-translated-search]
              [--taxonomic-profile <taxonomic_profile.tsv>]
              [--memory-use {minimum,maximum}]
              [--input-format {fastq,fastq.gz,fasta,fasta.gz,sam,bam,blastm8,genetable,biom}]
              [--search-mode {uniref50,uniref90}] [-v]
              [--metaphlan <metaphlan>]
              [--metaphlan-options <metaphlan_options>]
              [--prescreen-threshold <0.01>] [--bowtie2 <bowtie2>]
              [--bowtie-options <bowtie_options>]
              [--nucleotide-database <nucleotide_database>]
              [--nucleotide-identity-threshold <0.0>]
              [--nucleotide-query-coverage-threshold <90.0>]
              [--nucleotide-subject-coverage-threshold <50.0>]
              [--diamond <diamond>] [--diamond-options <diamond_options>]
              [--evalue <1.0>] [--protein-database <protein_database>]
              [--rapsearch <rapsearch>]
              [--translated-alignment {usearch,rapsearch,diamond}]
              [--translated-identity-threshold <Automatically: 50.0 or 80.0, Custom: 0.0-100.0>]
              [--translated-query-coverage-threshold <90.0>]
              [--translated-subject-coverage-threshold <50.0>]
              [--usearch <usearch>] [--gap-fill {on,off}] [--minpath {on,off}]
              [--pathways {metacyc,unipathway}]
              [--pathways-database <pathways_database.tsv>] [--xipe {on,off}]
              [--annotation-gene-index <3>] [--id-mapping <id_mapping.tsv>]
              [--remove-temp-output]
              [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
              [--o-log <sample.log>] [--output-basename <sample_name>]
              [--output-format {tsv,biom}] [--output-max-decimals <10>]
              [--remove-column-description-output]
              [--remove-stratified-output]

HUMAnN : HMP Unified Metabolic Analysis Network 3

optional arguments:
  -h, --help            show this help message and exit

[0] Common settings:
  -i <input.fastq>, --input <input.fastq>
                        input file of type {fastq,fastq.gz,fasta,fasta.gz,sam,bam,blastm8,genetable,biom} 
                        [REQUIRED]
  -o <output>, --output <output>
                        directory to write output files
                        [REQUIRED]
  --threads <1>         number of threads/processes
                        [DEFAULT: 1]
  --version             show program's version number and exit

[1] Workflow refinement:
  -r, --resume          bypass commands if the output files exist
  --bypass-nucleotide-index
                        bypass the nucleotide index step and run on the indexed ChocoPhlAn database
  --bypass-nucleotide-search
                        bypass the nucleotide search steps
  --bypass-prescreen    bypass the prescreen step and run on the full ChocoPhlAn database
  --bypass-translated-search
                        bypass the translated search step
  --taxonomic-profile <taxonomic_profile.tsv>
                        a taxonomic profile (the output file created by metaphlan)
                        [DEFAULT: file will be created]
  --memory-use {minimum,maximum}
                        the amount of memory to use
                        [DEFAULT: minimum]
  --input-format {fastq,fastq.gz,fasta,fasta.gz,sam,bam,blastm8,genetable,biom}
                        the format of the input file
                        [DEFAULT: format identified by software]
  --search-mode {uniref50,uniref90}
                        search for uniref50 or uniref90 gene families
                        [DEFAULT: based on translated database selected]
  -v, --verbose         additional output is printed

[2] Configure tier 1: prescreen:
  --metaphlan <metaphlan>
                        directory containing the MetaPhlAn software
                        [DEFAULT: $PATH]
  --metaphlan-options <metaphlan_options>
                        options to be provided to the MetaPhlAn software
                        [DEFAULT: "-t rel_ab"]
  --prescreen-threshold <0.01>
                        minimum percentage of reads matching a species
                        [DEFAULT: 0.01]

[3] Configure tier 2: nucleotide search:
  --bowtie2 <bowtie2>   directory containing the bowtie2 executable
                        [DEFAULT: $PATH]
  --bowtie-options <bowtie_options>
                        options to be provided to the bowtie software
                        [DEFAULT: "--very-sensitive"]
  --nucleotide-database <nucleotide_database>
                        directory containing the nucleotide database
                        [DEFAULT: /path/to/databases/chocophlan_v296_201901]
  --nucleotide-identity-threshold <0.0>
                        identity threshold for nuclotide alignments
                        [DEFAULT: 0.0]
  --nucleotide-query-coverage-threshold <90.0>
                        query coverage threshold for nucleotide alignments
                        [DEFAULT: 90.0]
  --nucleotide-subject-coverage-threshold <50.0>
                        subject coverage threshold for nucleotide alignments
                        [DEFAULT: 50.0]

[3] Configure tier 2: translated search:
  --diamond <diamond>   directory containing the diamond executable
                        [DEFAULT: $PATH]
  --diamond-options <diamond_options>
                        options to be provided to the diamond software
                        [DEFAULT: "--top 1 --outfmt 6"]
  --evalue <1.0>        the evalue threshold to use with the translated search
                        [DEFAULT: 1.0]
  --protein-database <protein_database>
                        directory containing the protein database
                        [DEFAULT: /path/to/databases/uniref90_v201901/]
  --rapsearch <rapsearch>
                        directory containing the rapsearch executable
                        [DEFAULT: $PATH]
  --translated-alignment {usearch,rapsearch,diamond}
                        software to use for translated alignment
                        [DEFAULT: diamond]
  --translated-identity-threshold <Automatically: 50.0 or 80.0, Custom: 0.0-100.0>
                        identity threshold for translated alignments
                        [DEFAULT: Tuned automatically (based on uniref mode) unless a custom value is specified]
  --translated-query-coverage-threshold <90.0>
                        query coverage threshold for translated alignments
                        [DEFAULT: 90.0]
  --translated-subject-coverage-threshold <50.0>
                        subject coverage threshold for translated alignments
                        [DEFAULT: 50.0]
  --usearch <usearch>   directory containing the usearch executable
                        [DEFAULT: $PATH]

[5] Gene and pathway quantification:
  --gap-fill {on,off}   turn on/off the gap fill computation
                        [DEFAULT: on]
  --minpath {on,off}    turn on/off the minpath computation
                        [DEFAULT: on]
  --pathways {metacyc,unipathway}
                        the database to use for pathway computations
                        [DEFAULT: metacyc]
  --pathways-database <pathways_database.tsv>
                        mapping file (or files, at most two in a comma-delimited list) to use for pathway computations
                        [DEFAULT: metacyc database ]
  --xipe {on,off}       turn on/off the xipe computation
                        [DEFAULT: off]
  --annotation-gene-index <3>
                        the index of the gene in the sequence annotation
                        [DEFAULT: 3]
  --id-mapping <id_mapping.tsv>
                        id mapping file for alignments
                        [DEFAULT: alignment reference used]

[6] More output configuration:
  --remove-temp-output  remove temp output files
                        [DEFAULT: temp files are not removed]
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        level of messages to display in log
                        [DEFAULT: DEBUG]
  --o-log <sample.log>  log file
                        [DEFAULT: temp/sample.log]
  --output-basename <sample_name>
                        the basename for the output files
                        [DEFAULT: input file basename]
  --output-format {tsv,biom}
                        the format of the output files
                        [DEFAULT: tsv]
  --output-max-decimals <10>
                        the number of decimals to output
                        [DEFAULT: 10]
  --remove-column-description-output
                        remove the description in the output column
                        [DEFAULT: output column includes description]
  --remove-stratified-output
                        remove stratification from output
                        [DEFAULT: output is stratified]

humann主要功能模块

humann_barplot
humann_strain_profiler
humann_benchmark
humann_genefamilies_genus_level
humann_reduce_table
humann_rna_dna_norm
humann_build_custom_database
humann_humann1_kegg
humann_regroup_table
humann_split_stratified_table
humann_unpack_pathways
humann_associate
humann_infer_taxonomy
humann_split_table

使用以下命令来运行 HUMAnN 3.0：

单个样品分别运行

# humann3已经不需要带3了，与2不同
humann --input input.fastq.gz --output output_dir --threads NUM_THREADS
# 正反序列直接按顺序多加一个input或-i参数,或者在-i参数后面两个文件逗号隔开
# 注意文件名和文件路径相同部分不能因为相同部分就使用简写
# 另外最好是指定输入文件类型--imput-format
humann -i <input_forward.fastq> -i <input_reverse.fastq> --output <output_directory> --imput-format fastq

#在此命令中，input.fastq.gz 是宏基因组数据文件，output_dir 是输出结果的目录，NUM_THREADS 是你希望使用的线程数。

查看结果

分析完成后，你可以在 output_dir 中找到生成的结果文件，包括代谢通路丰度、物种组成等信息。