diamond 学习

文献阅读:Benjamin Buchfink, Chao Xie & Daniel H. Huson, Fast and Sensitive Protein Alignment using DIAMOND, Nature Methods, 12, 59–60 (2015) doi:10.1038/nmeth.3176.

软件下载:https://github.com/bbuchfink/diamond

特点:速度快,比blastx速度快20,000倍

简要使用(核酸比对蛋白):

建立索引:

diamond makedb --in nr.fa -d nr

--in : 参考序列(格式:fasta)
-d   :索引的前缀名

比对:

diamond blastx -e 1e-5 --db $ref/nr -q $query.fa -o $out.diamond -p 20 -f 6 qseqid qlen qstart qend qcovhsp slen sstart send score evalue positive length ppos sseqid stitle nident mismatch gaps gapopen bitscore pident 

    -e  : 比对结果的期望值
    -db : 参考数据的索引
    -q  : 比对的序列
    -p  : 质量值
    -f  : 输出的文件格式
    

Value 6 may be followed by a space-separated list of these keywords:

  qseqid means Query Seq - id 查询序列的id

  qlen means Query sequence length 查询序列的长度

  sseqid means Subject Seq - id 

  sallseqid means All subject Seq - id(s), separated by a ';'

  slen means Subject sequence length

  qstart means Start of alignment in query 查询序列比对起始处

  qend means End of alignment in query 查询序列比对结束处

  sstart means Start of alignment in subject 比对到参考序列的起始处

  send means End of alignment in subject 比对到参考序列的结束处

  qseq means Aligned part of query sequence 

  sseq means Aligned part of subject sequence

  evalue means Expect value

  bitscore means Bit score

  score means Raw score

  length means Alignment length

  pident means Percentage of identical matches

  nident means Number of identical matches

  mismatch means Number of mismatches

  positive means Number of positive - scoring matches

  gapopen means Number of gap openings

  gaps means Total number of gaps

  ppos means Percentage of positive - scoring matches

  qframe means Query frame

  stitle means Subject Title

  salltitles means All Subject Title(s), separated by a '<>'

  qcovhsp means Query Coverage Per HSP

  Default: qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore

猜你喜欢

转载自blog.csdn.net/rojyang/article/details/81233727