OS Tools-GO enrichment analysis tools to use and interpret detailed tutorial

Original link: http://www.cnblogs.com/wangshicheng/p/10122797.html

GO on our cloud platform enrichment analysis tools, forms and documents required to enter parameters is very simple, but many students do not understand the interpretation of the principle is the result, this post just as we explained in detail -

a, GO enrichment description:
       gene Ontology (referred to as GO) is an international standardization of gene function classification system provides a dynamic update of standard vocabulary (controlled vocabulary) to fully describe properties of an organism's genes and gene products. There are three GO Ontology (body), describe the function of the gene molecule (molecular function), cellular components (cellular component), biological processes involved (biological process). The basic unit is the GO term (entry node), each term corresponds to a property.

Enriched meanings:
       each of the corresponding gene will have one or more Term GO (i.e. function GO).
       Enrichment involves two concepts: the foreground and background genes gene. Gene set to focus on the potential of genetic is your concern, genetic background is all the gene sets. For example, two samples do transcriptome sequencing control and treatment groups, the prospect is that the gene treatment group vs the control group of genes, genetic background is that two samples of all expressed genes. As another example, I want to know compared to the whole of Guangdong Province, Shenzhen City students is not significantly more ( "College" is equivalent to the people of Shenzhen where a GO term). So is the prospect of population Shenzhen, Guangdong Province, is the background of the population, each individual will have a label (such as college students, high school students, students, etc.).
       Enrichment means that, in the interest of a GO term prospects for gene centralized proportion was significantly higher than the proportion of all backgrounds gene centralized share. For example, the above example, Shenzhen City students significantly enriched, meaning that the proportion of the population graduated from Shenzhen, Shenzhen percentage of the total population is significantly higher than the proportion of the population graduated from Guangdong Province in the total population in Guangdong Province. For example, FIG, we calculated that 10% to 2% and if there is significant compared.


       Well, this "significance" is how to calculate it? We all know that the P value. P value using hypergeometric test calculated, the specific formula is as follows:


       wherein, N is the number of all genes Unigene having GO annotation; n is the difference N number of genes expressed; M is all Unigene annotated for a particular GO Term the number of genes; m is the number of annotated genes expressed as the difference in a particular GO term.
       The calculated P value after further correction for multiple testing, to obtain corrected-pvalue (i.e. Q value). We will normally Q value≤0.05 threshold value satisfies this condition is defined as the GO term differences in gene expression in a significantly enriched GO term. Second, the data preparation:        data to understand the principles of GO enrichment, we need to prepare only two: the foreground and background genes genes files. Enrichment of target gene file (file prospects gene) : gene set enrichment that you want to analyze. In the above example, is the difference between treatment vs control group gene expression. Each row is a gene format ID, the text file (tab-delimited) (*. Txt). BACKGROUND document GO gene: that all the gene sets, in the above example, the control and treatment groups is the expression of all genes. 1) If you study the species is to have a reference genome model species, then






       


       It can be used as reference genes already on the database file as a background genes. Species currently available are rice, Arabidopsis thaliana, mouse, rat, zebrafish, chicken, C. elegans, Drosophila, man. Type ID ID selectable gene or transcript ID, determined according to the type of ID enriched gene of interest. If you do not know what their file ID, you can click the "Preview reference document" to view specific ID. Behind the "version" is the latest version of Ensemble.


2) If you study the species is not within the scope of these choices, then we must prepare ourselves GO genetic background files.
Since a gene can have a plurality of GO term, so there are two file formats: one for the gene as a first ID, the second column corresponding GO term, as follows:


a further first row is a gene ID , after the second column of the same number for all GO gene, this format is the transcriptome denovo flow out our Ji Diao result format in the following table:upload any one of these two formats are possible ~ to Remember, prospects gene gene ID documents in the context of a gene must be included in the file inside! How to get GO annotation of genes?       Some students do not know how to get the GO No. background genes. In general, if your data is derived from the company sequenced, the sequencing report will have. If you are self Daoteng data, then, if the species is no reference genome, we need to Nr comment unigene obtain information unigene GO annotation with Blast2go software. Use specific Blast2go, refer to our second class of online communication: http://www.omicshare.com/forum/thread-176-1-1.html . If there is a reference genome species, can be downloaded from the official website GO GO annotation information, you can also download the GO annotation information on species Biomart, we also have the appropriate tutorial: http://www.omicshare.com/forum/thread- 437-1-1.html .








After these two files to upload, click "Submit" OK. Waiting to receive food ~ ~


Third, the interpretation of the results:        This time I try to run data analysis result data in trend analysis articles lychee trend, I profile1 to contain a gene for the prospects of gene sets containing genes all trends gene set as the background do GO enrichment analysis. 1. GO FIG secondary classification (out.secLevel.png / svg)        This figure shows the enrichment and the number of genes each GO term in profile1. GO ontology three abscissa represents a finer classification, i.e. two classification; ordinate represents the number of genes contained per category. Since the gene is often a corresponding number of GO term, so there will be the same gene in different categories of entry, ie more statistical, so if you put all the pillars of the number of genes here together, certainly more than the total number of genes profiel1 of. 2. GO enrichment result table ( OUT. [The PFC] .html)       three Ontology (C, F, P), respectively, will show. In biological processes (biological process) as an example, the following table:









       The first column of the GO term ID, click GO ID, you can show all the genes that GO term comprising:

 

 

 

Then click on the GO ID, you can link to http://amigo.geneontology.org official website, you can view specific information of GO.



       The second column is the functional description of the GO term;

 

       The third column of numbers in front of the number to the enrichment of differentially expressed genes in GO term, and the latter figure is the total number of differences in gene expression;

 

       Fourth column number in front of the background to this gene enriched GO term of the number of genes, the total number of background behind the number of genes;

 

       Column 5 is the P value, i.e., the third column the percentage calculated in comparison with the percentage of the fourth column, if there is a significant difference. We will be less than 0.05 P value marked red display;

 

       After the sixth correction for multiple testing as Q value, but also the Q value of less than 0.05 displayed marked red. These GO term is small to large P value in accordance with the arrangement, looking for differences in the teacher to facilitate enrichment results. As in this example, microtubule-based Process for the most significant enrichment of differentially expressed genes in the GO term, described in profile1 gene was significantly enriched in this function.




3. GO directed acyclic graph (out.C / P / F.png)
       On the whole, GO annotation system is a directed acyclic graph (Directed Acyclic Graphs), the relationship between each GO term is unidirectional, the relationship between the classification of the three GO term: IS A, and regulates Part of . Specific interpretation to see this post: http://www.omicshare.com/forum/thread-538-1-1.html . Enrichment analysis results are given in three GO Ontology (cellular component, molecular function, biological process) a directed acyclic graph, below a biological process is a directed acyclic graph:
 

 

In this figure, the closer to the root of the GO term summarize, branches down to the GO term annotation is finer level term. Let's look at each GO term in meaning:
 

 

Wherein, Pvalue this line, if it exceeds 0.05, NA is displayed, i.e., this shows only significant P value.

 

Meaning shape: program defaults to the most significant first 10 GO term is set to a square, round the other GO term.

 

颜色的含义:颜色越深,代表该GO term越显著。颜色由浅到深分别为:无色——浅黄——深黄——红色。
那么,从颜色上来看,在molecular function这个ontology上,最显著的GO term是GO:0003774。因此后续可以从这个GO term入手,这个GO term所在的分支上的其他GO term也值得研究。

 

意义:
GO有向无环图展示了GO term之间的分类关系,并且从另一方面帮助老师寻找显著富集的GO term。



四、引用
大家如果在数据处理的过程中,使用了我们的omicshare tools云工具网站,那么在文章中的method部分可以这样引用: GO enrichment analysis was performed using the OmicShare tools,a free online platform for data analysis (www.omicshare.com/tools)。
 
五、详细版:英文method 加 引用

Gene Ontology (GO)is an international standardized gene functional classification system whichoffers a dynamic-updated controlled vocabulary and a strictly defined conceptto comprehensively describe properties of genes and their products in anyorganism. GO has three ontologies: molecular function, cellular component andbiological process. The basic unit of GO is GO-term. Each GO-term belongs to a type of ontology.
GO enrichment analysis provides all GO terms that significantly enriched in DEGs comparing to the genome background, and filter the DEGs that correspond to biological functions.  GO enrichment analysis was performed using the OmicShare tools,a free online platform for data analysis (www.omicshare.com/tools ) Firstly all DEGs were mapped to GO terms in the Gene Ontology database ( http://www.geneontology.org/ ), gene numbers were calculated for every term, significantly enriched GO terms in DEGs comparing to the genome background were defined by hypergeometric test. The calculated p-value was gone through FDR Correction, taking FDR ≤ 0.05 as a threshold. GO terms meeting this condition were defined as significantly enriched GO terms in DEGs. This analysis was able to recognize the main biological functions that DEGs exercise.

转载于:https://www.cnblogs.com/wangshicheng/p/10122797.html

Guess you like

Origin blog.csdn.net/weixin_30699235/article/details/94880999