GO on our cloud platform enrichment analysis tools, forms and documents required to enter parameters is very simple, but many students do not understand the interpretation of the principle is the result, this post just as we explained in detail -
a, GO enrichment description:
gene Ontology (referred to as GO) is an international standardization of gene function classification system provides a dynamic update of standard vocabulary (controlled vocabulary) to fully describe properties of an organism's genes and gene products. There are three GO Ontology (body), describe the function of the gene molecule (molecular function), cellular components (cellular component), biological processes involved (biological process). The basic unit is the GO term (entry node), each term corresponds to a property.
Enriched meanings:
each of the corresponding gene will have one or more Term GO (i.e. function GO).
Enrichment involves two concepts: the foreground and background genes gene. Gene set to focus on the potential of genetic is your concern, genetic background is all the gene sets. For example, two samples do transcriptome sequencing control and treatment groups, the prospect is that the gene treatment group vs the control group of genes, genetic background is that two samples of all expressed genes. As another example, I want to know compared to the whole of Guangdong Province, Shenzhen City students is not significantly more ( "College" is equivalent to the people of Shenzhen where a GO term). So is the prospect of population Shenzhen, Guangdong Province, is the background of the population, each individual will have a label (such as college students, high school students, students, etc.).
Enrichment means that, in the interest of a GO term prospects for gene centralized proportion was significantly higher than the proportion of all backgrounds gene centralized share. For example, the above example, Shenzhen City students significantly enriched, meaning that the proportion of the population graduated from Shenzhen, Shenzhen percentage of the total population is significantly higher than the proportion of the population graduated from Guangdong Province in the total population in Guangdong Province. For example, FIG, we calculated that 10% to 2% and if there is significant compared.
Well, this "significance" is how to calculate it? We all know that the P value. P value using hypergeometric test calculated, the specific formula is as follows:
wherein, N is the number of all genes Unigene having GO annotation; n is the difference N number of genes expressed; M is all Unigene annotated for a particular GO Term the number of genes; m is the number of annotated genes expressed as the difference in a particular GO term.
The calculated P value after further correction for multiple testing, to obtain corrected-pvalue (i.e. Q value). We will normally Q value≤0.05 threshold value satisfies this condition is defined as the GO term differences in gene expression in a significantly enriched GO term. Second, the data preparation: data to understand the principles of GO enrichment, we need to prepare only two: the foreground and background genes genes files. Enrichment of target gene file (file prospects gene) : gene set enrichment that you want to analyze. In the above example, is the difference between treatment vs control group gene expression. Each row is a gene format ID, the text file (tab-delimited) (*. Txt). BACKGROUND document GO gene: that all the gene sets, in the above example, the control and treatment groups is the expression of all genes. 1) If you study the species is to have a reference genome model species, then
It can be used as reference genes already on the database file as a background genes. Species currently available are rice, Arabidopsis thaliana, mouse, rat, zebrafish, chicken, C. elegans, Drosophila, man. Type ID ID selectable gene or transcript ID, determined according to the type of ID enriched gene of interest. If you do not know what their file ID, you can click the "Preview reference document" to view specific ID. Behind the "version" is the latest version of Ensemble.
2) If you study the species is not within the scope of these choices, then we must prepare ourselves GO genetic background files.
Since a gene can have a plurality of GO term, so there are two file formats: one for the gene as a first ID, the second column corresponding GO term, as follows:
a further first row is a gene ID , after the second column of the same number for all GO gene, this format is the transcriptome denovo flow out our Ji Diao result format in the following table:upload any one of these two formats are possible ~ to Remember, prospects gene gene ID documents in the context of a gene must be included in the file inside! How to get GO annotation of genes? Some students do not know how to get the GO No. background genes. In general, if your data is derived from the company sequenced, the sequencing report will have. If you are self Daoteng data, then, if the species is no reference genome, we need to Nr comment unigene obtain information unigene GO annotation with Blast2go software. Use specific Blast2go, refer to our second class of online communication: http://www.omicshare.com/forum/thread-176-1-1.html . If there is a reference genome species, can be downloaded from the official website GO GO annotation information, you can also download the GO annotation information on species Biomart, we also have the appropriate tutorial: http://www.omicshare.com/forum/thread- 437-1-1.html .
After these two files to upload, click "Submit" OK. Waiting to receive food ~ ~
Third, the interpretation of the results: This time I try to run data analysis result data in trend analysis articles lychee trend, I profile1 to contain a gene for the prospects of gene sets containing genes all trends gene set as the background do GO enrichment analysis. 1. GO FIG secondary classification (out.secLevel.png / svg) This figure shows the enrichment and the number of genes each GO term in profile1. GO ontology three abscissa represents a finer classification, i.e. two classification; ordinate represents the number of genes contained per category. Since the gene is often a corresponding number of GO term, so there will be the same gene in different categories of entry, ie more statistical, so if you put all the pillars of the number of genes here together, certainly more than the total number of genes profiel1 of. 2. GO enrichment result table ( OUT. [The PFC] .html) three Ontology (C, F, P), respectively, will show. In biological processes (biological process) as an example, the following table:
转载于:https://www.cnblogs.com/wangshicheng/p/10122797.html