Pooled genome sequence strategies |representative genome assembly approaches|Domestication|GERP|selective sweep|Hybridization|Introgression|iHS|SNP genotyping arrays|haplotype

Design based on biology

By way of comparison genomics, data vertebrate genome biology to solve various problems. New regulatory comments (appears in the evolution of vertebrates) can enrich the tree species (such as differences in the function of the protein evolutionary rate (because the protein-coding genes and early evolution of gene discovery)).

Sequencing need superposition of two strategies:

. 1 . Pooled Genome Sequence Strategie S: measuring different individuals of the same species, individuals of different superimposed.

2 . Representative Approaches Genome Assembly  : because of good quality sequence fragment (reasonable N50 contig), it is applicable to the absence of the long sequence. If the assembly of good quality and can be used as the reference sequence

Domestication : Because such people to change the natural change, so that people choose to change, change portion due to man-made changes.  

 

Project design

 

 

 

 

Because the biological analysis relies on assembly data, assembly requires attention to detail to reduce the error ( confounding Effects ).

Data acquisition

Flowchart : 1. What kind of research corresponding to what kind of sequencing mode 2. In addition to the resource-oriented, comparative genomics can help find the internal mechanism 3.

statistics:fixation index

GERPgenome evolutionary rate profiling:"GERP identifies constrained elements in multiple alignments by quantifying substitution deficits. These deficits represent substitutions that would have occurred if the element were neutral DNA, but did not occur because the element has been under functional constraint. We refer to these deficits as "Rejected Substitutions". Rejected substitutions are a natural measure of constraint that reflects the strength of past purifying selection on the element."

GWAS:genome-wide association studies

Effect of the Genome Content at The : If the sequencing is good quality, you can expand the scope of change data. The quality of the sequencing and sequencing technology (previous technology: Radiation Hybrids and BAC Maps, BACs and fosmids , now PacBio, Dovetail and BioNano ) related.

Because you can assembly the majority of genes, species were so close to conservative line can be used to study genome structure, integrated use of new technology makes assembly of better quality ( N50 becomes longer), so as to solve the problem caused by past due to lack of technology. Because of the complexity of vertebrate genomes (since vertebrate genomes has its own unique characteristics: 1. High Repetition 2. high CG content 3. minichromosomes (smaller mass chromosome)), this new technology need (as New Technology region may be used alone for repeated span ).

 

Standing variation, imputation and mapping:

Variation : mutation was found that the key to select samples: sample selection and may be selected as the probe as important differences, so a plurality of individual integrated low sequence coverage (comprehensive DETAILED difference detection using software) is advantageous.

Imputation : confirm the cause of mutations: by 1. The genetic distance, 2. the sliding window model confirmed sweep or crossing or backcrossing.

selective sweep: select Transfer: because the choice of a locus of diversity leads to decline around the site.

 

 

A selective sweep

Under natural selection, a new beneficial mutation will rise in frequency (prevalence) in a population. A schematic of polymorphisms along a chromosome, including the selected allele, before and after selection. Ancestral alleles are shown in gray and derived (non-ancestral) alleles are shown in blue. As a new positively selected allele (red) rises to high frequency, nearby linked alleles on the chromosome 'hitchhike' along with it to high frequency, creating a 'selective sweep.'

Hybridization:杂交:不同亲本之间杂交。

Introgression:回交:亲本和子代杂交。

Integrated haplotype homozygosity score:iHS (Integrated Haplotype Score) is a statistic that has been developed to detect evidence of recent positive selection at a locus. It is based on the differential levels of linkage disequilibrium(LD) surrounding a positively selected allele compared to the background allele at the same position.

 

Mapping:基因型与性状相对应:随着(1.SNP微阵列技术2.高通量测序价格下降),单倍体模型(通过足够的SNP密度数据)用于研究种群历史和基因型与性状的对应。

SNP genotyping arrays:SNP阵列是一种DNA微阵列,用于检测群体内的多态性。单核苷酸多态性是DNA中单个位点的变异,是基因组中最常见的变异类型。在人类基因组中已经鉴定了大约3.35亿个SNP,其中1500万个在全世界不同人群中以1%或更高的频率存在。

单倍型(haplotype:若干个决定同一性状的紧密连锁的基因构成的基因型

 

Complex mutation types: the good with the bad

the bad因为技术水平所限(SRS),所以许多高区域性突变(高区域性杂合和基因组断裂)无法找到,仅有少部分例子通过精确比对,可以解剖重排。

The good:现在出现了PacBio SMRT技术可以解开结构多样性

 

Layering complexity: gene and transcript annotation

1.DNA Annotation>transcription annotation(方式1:比照相似物种的基因组;方式2mapping RefSeq上的转录组)后可得到RNA序列(转录组,物种特异性的)

2.调节原件信息进行注释:调节原件导致突变(因为GWAS利用非编码区识别突变(GWAS可以map outside gene,以此达到检测疾病的目的。))

3.非编码区:GWAS基因组关联分析(genome-wide association studies, GWAS)已经被广泛用于复杂疾病的遗传位点的分析。 然而,GWAS 发现的复杂疾病相关的遗传变异,即单核苷酸多态性(SNP)位点大多位于基因的非编码区,并且同一区域中连锁的遗传变异(SNP)位点可以多达成百上千个

4.特殊(特殊分类标准的)生物数据平台上的特殊data set进行注释

5.通过识别保守原件(来自不同物种的)进行注释

 

Vertebrate comparative genomicsNatural disease models: domestic animals

 

物种分为模式生物和自然生物,自然生物正是研究稳态和健康特征的优选,所以在多目标的前体下驯化动物和自然动物都可以作为模型对象。

驯化的结果是表型的一致和疾病的富集。最近发现驯养动物有类人疾病(虽然没有实验室环境,但也是人类选择的结果)。用该动物不仅可以研究人类疾病,也可以造福该物种。

物种基因组比较,有助于annotation 2.通过GWAS找到SNP3.有助于找到sweep

Intraspecies comparison: a tool to study recent phenotypic adaptations种内多个体比较可得到selective sweeps特点是聚集多基因和基因多态eg(不同季节的)鱼的单倍体基因长序列(coding区和非coding区,共同控制)控制一类及相关性状;eg不同海拔的sheep(由同一物种得到的性状different,采用不同物种作为验证。)

 

adaptations (microevolution)

Guess you like

Origin www.cnblogs.com/yuanjingnan/p/11116492.html