基因组共线性学习

什么是共线性?   

基因组大小变化是一种相对频繁的事件,但这些变化并不与基因多少及顺序变化相关联。基因数量及顺序的保守性称为同线性(colinearity)
    最初,共线性是用来描述同一染色体上基因位置关系,现在更多是指有同一祖先型分化而来的不同物种间的基因的类型及相对序列的保守性。

共线性的应用意义?

     主要在直系同源基因的识别、蛋白编码基因的注释和进化事件的发现等。(真核生物中用于推断直系同源基因的方法都是基于蛋白序列的相似性)

共线性的分析工具

    1、寻找物种间保守序列的算法主要有Smith、Waterman、Needleman Wunsch(动态规划的方法对比对的序列进行打分、评定),局限:无法处理大数据、运行时间长,无法区分旁系同源基因。

   2、基于基因水平的共线性算法,利用蛋白比对找到同源蛋白,借助编码蛋白的基因在染色体的相对位置,获取共线性片段,局限:无法推断植物基因组的共线性片段(高重复的基因)

  3、基于全基因组序列比对的共线性算法

工具:

Sibelia a highly accurate and easy-to-use software tool for comparing two closely related bacterial genomes, which can be presented as either finished sequences or fragmented assemblies. C-Sibelia takes as input two FASTA files and produces: (1) a VCF file containing all identified single nucleotide variations and indels; (2) an XMFA file containing alignment information. The software also produces Circos diagrams visualizing high level genomic architecture for rearrangement analyses.

扫描二维码关注公众号,回复: 4285869 查看本文章
下载地址:https://sourceforge.net/projects/sibelia-bio/files
Sibelia -s loose example.fasta ref.fasta
circos文件夹,运行:
circos -conf circos.conf 

MCScan is an algorithm to scan multiple genomes or subgenomes to identify putative homologous chromosomal regions, then align these regions using genes as anchors. MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity and extends the software by incorporating 15 utility programs for display and further analyses.

参考:https://www.jianshu.com/p/740cb9eccf2b

python版的MCscan : 

https://mp.weixin.qq.com/s?__biz=MzIyNzIyNTczNA==&mid=2247485631&idx=1&sn=ffb7236f0eba698a7e5d7eb969414e46&chksm=e8653156df12b8405450470a55b7ce86b27210e6e63df3c65257333a513f154b0b1754884d04&mpshare=1&scene=1&srcid=1107272I98VWYTq9HQx4jGbF&from=singlemessage&ascene=1&devicetype=android-25&version=26060739&nettype=WIFI&abtest_cookie=BAABAAgACgALAA0ABACehh4AI5ceAFeZHgCKmR4AAAA%253D&lang=zh_CN&pass_ticket=tbZBSrl9ul0h8GAPgiOZuA1U3Nm6Q%252Bbf88RRTS%252FVh4svU1qzC%252FN%252FukjF8ITH3Y5z&wx_header=1

help
https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)#dependencies

可视化工具 (VGSC)

This is a software package to generate different kinds of plot in order to enhance the downstream analysis for synteny and collinearity study.

下载: https://bio.njfu.edu.cn/vgsc-web/

命令:
#!/bin/sh
java -jar VGSC.jar -tp  DualSynteny -ig data/os_sb.gff -is data/os_sb.collinearity -ic data/dual_synteny.ctl -os result/daul_synteny.png

(1)-tp Plot_TYPE is the type of the plot,it such be one of following four:Bar、Dot、Circle、DualSynteny
(2)-ig GFF_FILE is the input file of gene annotation;
(3)-is SYNTENY_FILE is the input synteny file;
(4)-ic CONTROL_FILE is the input control file;
(5)-os OUTPUT_FILE is the output file.

猜你喜欢

转载自blog.csdn.net/rojyang/article/details/84027724