NCBI Blast+的本地安装与使用

NCBI Blast是常用的序列查找工具, 包括蛋白, 核酸. 一般使用网页进行查询即可, 但有时候开发则需要本地的数据库以及程序. NCBI提供Blast+工具包, 内含多种blast工具, 介绍可以参考NCBI提供的两份文档(书):

下载与安装

Blast+的下载

Program Function
blastdbcheck Checks the integrity of a BLAST database
blastdbcmd Retrieves sequences or other information from a BLAST database
blastdb_aliastool Creates database alias (to tie volumes together for example)
Blastn Searches a nucleotide query against a nucleotide database
blastp Searches a protein query against a protein database
blastx Searches a nucleotide query, dynamically translated in all six frames, against a protein database
blast_formatter Formats a blast result using its assigned request ID (RID) or its saved archive
convert2blastmask Converts lowercase masking into makeblastdb readable data
deltablast Searches a protein query against a protein database, using a more sensitive algorithm
dustmasker Masks the low complexity regions in the input nucleotide sequences
legacy_blast.pl Converts a legacy blast search command line into blast+ counterpart and execute it
makeblastdb Formats input FASTA file(s) into a BLAST database
makembindex Indexes an existing nucleotide database for use with megablast
makeprofiledb Creates a conserved domain database from a list of input position specific scoring matrix (scoremats) generated by psiblast
psiblast Finds members of a protein family, identifies proteins distantly related to the query, or builds position specific scoring matrix for the query
rpsblast Searches a protein against a conserved domain database to identify functional domains present in the query
rpstblastn Searches a nucleotide query, by dynamically translating it in all six-frames first, against a conserved domain database
segmasker Masks the low complexity regions in input protein sequences
tblastn Searches a protein query against a nucleotide database dynamically translated in all six frames
tblastx Searches a nucleotide query, dynamically translated in all six frames, against a nucleotide database similarly translated
update_blastdb.pl Downloads preformatted blast databases from NCBI
windowmasker Masks repeats found in input nucleotide sequences

executables 除了提供 Blast+, 还提供其他工具:

  • magic-blast : 用于映射大的next-generation RNA和DNA序列到全基因组或转录组的. 可参考Magic-Blast
  • IgBlast : 分析免疫球蛋白和T细胞受体可变区域序列. 可参考IgBlast 和相关文献.
  • rmblast :
  • remote-fuser :

配置

  1. 将BLAST按照目录export到PATH, 例如 export PATH=$PATH:$HOME/ncbi-blast-2.8.1+/bin. 这可保证直接执行.
  2. 管理数据库:
  • 创建一个存放数据库的文件夹: mkdir $HOME/ncbi-blast-2.8.1+/blastdb
  • 设置BLASTDB环境变量, export BLASTDB=$HOME/blastdb
  • 自行下载和解压相关序列数据库
  • 使用updata_blastdb.pl来管理数据库.

数据库的下载

NCBI FTP服务器提供一个BLAST的专门文件夹 : ftp://ftp.ncbi.nlm.nih.gov/blast/, 含有BLAST程序以及数据库. 内含以下子文件夹:

  • db : 数据库, 很重要
  • executables : 可执行程序, 包括Blast+
  • documents : 文档
  • demo : 各种提供给开发者的demonstration packages
  • matrices : Different supported and experimental scoring matrices
  • WGS_TOOLS : 产生WGS计划数据库的工具
  • temp : 杂项文件
  • windowmasker_files : A collection of windowmasker files for various organisms/genomes, each in its own subdirectory named using their taxonomic ids

配置

可执行文件路径加入到环境变量. 将blast内bin的文件夹路径加入到PATH环境变量即可, 请自行搜索具体方法. 例如Bash: export PATH=$PATH:/usr/local/ncbi/blast/bin

另外一个重要的配置是BLASTDB环境变量, 即blast进行搜索时数据库所在. 根据数据库位置进行设置, 例如 : export BLASTDB=$HOME/blastdb

示例

官方简单示例1
  • 使用blastdbcmd提取已安装数据库(refseq_rna.00)中的nm_000122序列到文档test_query.txt
  • 运行blastn进行核酸的搜索, 也是搜索本地该数据库.

Standalone BLAST Setup for Unix 内的例子

$ blastdbcmd -db refseq_rna.00 -entry nm_000122 -out test_query.fa
$ blastn -query test_query.fa -db refseq_rna.00 -task blastn -dust no -outfmt "7 qseqid sseqid evalue bitscore" -max_target_seqs 2
# BLASTN 2.2.29+
# Query: gi|263191547|ref|NM_000122.3| Homo sapiens mutL homolog 1 (MLH1), transcript variant 1, mRNA
# Database: refseq_rna.00
# Fields: query id, subject id, evalue, bit score
# 2 hits found
gi|263191547|ref|NM_000122.3|   gi|263191547|ref|NM_000122.3|   0.0      4801
gi|263191547|ref|NM_000122.3|   gi|332816398|ref|XM_001170433.2|        0.0      4758
# BLAST processed 1 queries

猜你喜欢

转载自blog.csdn.net/weixin_34072159/article/details/87581265
今日推荐