Based on four major categories of algorithms summary of whole-genome sequencing data to identify structural variations


Genomic variation diagram of different types (Source: labspaces)

 

Last everyone summary describes the genome single nucleotide polymorphism identification method (single nucleotide polymorphism, SNP) today to introduce structural variation (structural variations, SV) type and identification methods based on genome sequencing data.

Because the structural variation is a major cause of species phenotypic differences, and, in particular, is closely associated with various diseases such as cancer occurrence and development, the study of structural variation is very important.

 

Structural variation generally refers to a length greater than 1Kb genomic sequence variation, including a plurality of different types: Insert (Insertion), deletion (Deletion), inversion (Inversion), ectopic (translocation of), copy number variations (copy number variation, CNV or Duplication) (more exciting please pay attention to micro-channel public number: AIPuFuBio).

Shown in the diagram as follows:


A schematic view of different types of structural variations (Source:. Alkan et al, Nature Review Genetics, 2011.) Note:. Ref represents reference genome (reference)

 

Previously, chip (array) is a set of structural variation range of a very popular means to detect genome-wide, but now with the sequencing of falling prices, as well as the advantages of sequencing technology (especially single-base resolution), whole genome sequencing has become a full-detection the preferred structure of the genome-wide variation.

Below we describe one by one into four categories Genomic sequencing data based on detecting structural variations :

1. paired-end mapping (PEM), based on two-terminal sequencing reads match;

2. split read mapping (SRM), read-based segmentation match;

3. depth of coverage (DOC), based on the coverage of the read;

4. assembly-based approach (ASA),基于组装的方法;

具体如下图所示:


不同类型结构变异的检测方法适用性示意图(图片来源:Alkan et al., Nature Review Genetics, 2011)

 

从上图中可以看出,这四大类方法并不是适合所有类型的基因组结构变异检测,其中:

1. 基于Read pair,即基于双端测序读段匹配(paired-end mapping)的方法,适用于所有类型的基因组结构变异检测;

2. 基于Read depth,即基于read的覆盖度(depth of coverage,DOC),主要适用于缺失(deletion)和duplication(重复或拷贝数变异)这两大类型的结构结构变异检测;

3. 基于Split read,即基于read分割匹配(split read mapping,SRM),这种方法也适合于所有类型的基因组结构变异检测;

4. 基于Assembly,即基于组装的方法(assembly-based approach,ASA),这种方法也适合于所有类型的基因组结构变异检测;

虽然这四类方法可以用于检测不同类型基因组结构变异,但每种检测方法都有各自的优缺点。具体体现在不同方法的检测精度、可检测结构变异的大小范围、还有复杂度等有一定的区别。

 

如虽然四类方法都可检测拷贝数变异(copy number variation,CNV),但各类方法的检测准确性和可检测的CNV大小是有明显差别的,具体如下图所示:


四类方法检测CNV的准确性和CNV大小的比较(图片来源:Hehir-Kwaet al, Exper Rev Mol Diagn, 2015)

 

由上图可知,四类方法在检测CNV时,它们的检测准确性和可检测的CNV大小明显不同。其中基于read的覆盖度(depth of coverage,DOC)的方法虽然可检测比较大的CNV,但其检测精度较低,而基于read分割匹配(split read mapping,SRM)的方法虽然检测CNV的精度高,但检测的CNV长度通常偏小。

所以,这四类方法在检测基因组结构变异时有各自的长项和短处,是相互补的,可以联合起来使用,以提高结构变异检测范围和精度。

 

那么检测各类结构变异比较好的软件或工具有哪些呢?下图列出了一些性能比较好的结构变异检测软件,具体如下所示:


在模拟数据和真实数据中检测各类结构变异比性能较好的软件展示(图片来源:Kosugi et al. Genome Biology, 2019)。注:DEL为deletion,DUP为duplication,INS为Insertion,INS Unspecified为insertions of unspecified sequence,MEI为mobile element insertions,INV为Inverstion
 

总的来说,不同软件或算法在检测结构变异时,它们的检测准确性主要依赖于检测的结构变异类型和结构变异的大小,而且不同软件有各自的强项和弱项,联合使用不同的方法可有效提高检测结构变异的精度和覆盖更广的结构变异长度范围。(更多精彩,可见大型免费综合生物信息学资源和工具平台AIPuFu:www.aipufu.com,关注微信公众号:AIPuFuBio)。

 

希望今天的内容对大家有用,会持续更新经典内容,欢迎留言~~!

 

Guess you like

Origin www.cnblogs.com/aipufu/p/11577407.html