宏基因组序列无参考基因组装工具idba-ud的介绍及详细使用方法

介绍

idba-ud工具是一种用于组装无参考基因组的工具，它可以将高通量测序数据转化为基因组序列。它是idba工具的升级版本，专门用于组装多样性的无参考基因组。

idba-ud的主要作用是通过组装测序数据，生成无参考基因组的序列。它能够处理短读长和长读长两种类型的测序数据，并且能够在组装过程中处理高度异质性的数据。idba-ud还具有高度并行化的特点，可以充分利用计算资源进行快速的基因组组装。

idba-ud的背景产生源于生物学领域对于无参考基因组组装的需求。在研究某些物种的基因组时，可能找不到合适的参考序列进行比对，这时就需要利用无参考基因组组装工具来获得该物种的基因组序列。由于不同物种的基因组具有不同的特点，idba-ud针对多样性的无参考基因组进行了优化和改进。

idba-ud工具的开发和改进是基于前人在无参考基因组组装领域的研究工作。它采用了一种称为de Bruijn图的数据结构，用于将测序数据转化为序列片段，并通过比对、连接和确定序列的方向性来组装基因组。idba-ud还充分考虑了数据的异质性，采用了多种策略来处理高度异质的数据，提高了基因组组装的准确性和可靠性。

总的来说，idba-ud工具的作用是通过组装无参考基因组，获得物种的基因组序列，为生物学研究提供重要的基础数据。它的背景产生源于对无参考基因组组装的需求，并基于前人的研究工作进行了改进和优化，使得它能够处理多样性的无参考基因组数据，并具有高度并行化和处理异质性数据的能力。

安装

git clone https://github.com/loneknightpy/idba.git

$ ./configure
$ make

放入系统环境这个大家按需求去设置吧，个人直接用绝对路径

使用

序列转换

idba默认使用fasta文件作为输入，因此fastq文件和双端pair的fastq文件需要使用fq2fa进行转换

fq2fa read.fq read.fa

#双端转换
fq2fa --merge --filter read_1.fq read_2.fq read.fa

序列组装：

超级简单吧，不过这个要注意机子内存，虽然没那么耗内存，但对稍微大一点的数据集也会耗不少。

idba_ud -r read.fa -o idba_assembly

# -r 输入reads序列
# -o 输出结果目录

全参数帮助信息：

idba_ud --help
idba_ud: unrecognized option '--help'
uknown option
IDBA-UD - Iterative de Bruijn Graph Assembler for sequencing data with highly uneven depth.
Usage: idba_ud -r read.fa -o output_dir
Allowed Options: 
  -o, --out arg (=out)                   output directory
  -r, --read arg                         fasta read file (<=600)
      --read_level_2 arg                 paired-end reads fasta for second level scaffolds
      --read_level_3 arg                 paired-end reads fasta for third level scaffolds
      --read_level_4 arg                 paired-end reads fasta for fourth level scaffolds
      --read_level_5 arg                 paired-end reads fasta for fifth level scaffolds
  -l, --long_read arg                    fasta long read file (>600)
      --mink arg (=20)                   minimum k value (<=312)
      --maxk arg (=100)                  maximum k value (<=312)
      --step arg (=20)                   increment of k-mer of each iteration
      --inner_mink arg (=10)             inner minimum k value
      --inner_step arg (=5)              inner increment of k-mer
      --prefix arg (=3)                  prefix length used to build sub k-mer table
      --min_count arg (=2)               minimum multiplicity for filtering k-mer when building the graph
      --min_support arg (=1)             minimum supoort in each iteration
      --num_threads arg (=0)             number of threads
      --seed_kmer arg (=30)              seed kmer size for alignment
      --min_contig arg (=200)            minimum size of contig
      --similar arg (=0.95)              similarity for alignment
      --max_mismatch arg (=3)            max mismatch of error correction
      --min_pairs arg (=3)               minimum number of pairs
      --no_bubble                        do not merge bubble
      --no_local                         do not use local assembly
      --no_coverage                      do not iterate on coverage
      --no_correct                       do not do correction
      --pre_correction                   perform pre-correction before assembly