A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS

A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS

PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results.

PacBio RS是一个新兴的第三代DNA测序平台,它基于一种实时的单分子纳米nitch测序技术,与第一代和第二代测序技术产生的较短的reads相比,可以产生非常长的reads(最高20 kb)。

PacBio序列数据作为一个新的平台,其测序错误率的评估以及与PacBio序列数据相关的质量控制(QC)参数的评估是非常重要的。在这项研究中,使用PacBio RS测序平台对10个已知的、密切相关的DNA扩增子进行了测序。

将上述测序实验得到的循环一致序列(CCS) reads对已知参考序列进行比对后,发现不进行read QC的中位错误率为2.5%,而采用SVM多参数QC方法的中位错误率为1.3%。此外,重新组装作为下游应用程序,以评估不同的质量控制方法的效果。

这一基准研究表明,即使CCS reads在错误纠正后仍然需要对CCS reads进行适当的QC,以获得成功的下游生物信息学分析结果。

发布了515 篇原创文章 · 获赞 79 · 访问量 17万+

猜你喜欢

转载自blog.csdn.net/u010608296/article/details/103460903
今日推荐