It turns out that gene functional enrichment analysis is so simple

This issue mainly introduces some basic concepts of gene functional enrichment analysis, and also introduces how to use the DAVID online analysis tool to perform GO/KEGG functional enrichment analysis of genes.

What is gene functional enrichment analysis?

      Gene function enrichment analysis refers to the statistical analysis and gene function classification of many genes obtained through various databases, thereby mining the gene function categories in the database that are significantly related to the biological problems we are studying. . However, it should be noted that different gene combinations may have different biological functions in different biological backgrounds. Therefore, it is necessary to screen the corresponding gene sets according to the actual situation and connect them with the functional changes related to the object of study, so as to classify the many differential genes and finally focus on the key differential genes related to the biological problem under study. , thereby providing direction and basis for subsequent experimental verification. All in all, the essence of gene function enrichment analysis is cluster analysis, which is used to interpret the biological knowledge behind a group of genes and reveal their roles inside or outside the cell.

Why perform functional enrichment analysis?

      With the development of high-throughput sequencing technology, the field of biological research has entered the omics era. However, the huge amount of data in omics sequencing has daunted researchers. The effective extraction and analysis of this data information has also become a key focus of many researchers. Taking transcriptome sequencing (RNA-seq) as an example, the sequencing results often yield a series of differentially expressed genes, but how researchers connect these genes to the biological issues being studied and potential regulatory mechanisms has become a key issue. Therefore, researchers can perform functional enrichment analysis on genes through multiple functional annotation databases, divide this series of gene sets into different functional categories, and search for biological pathways that play a key role in biological processes, thereby revealing and understanding The basic and underlying molecular mechanisms of these biological processes. In fact, the starting point of the molecular level is the gene level, but there are many types of genes, and the best way to understand the biological significance of these genes is gene function enrichment analysis .

     Depending on the gene selection and annotation database during the enrichment analysis process, commonly used enrichment analyzes can be divided into the following types: GO functional enrichment, KEGG pathway enrichment, GSEA gene set enrichment, etc.

GO functional enrichment analysis

      The Gene Ontology (GO) database is a database built by the GO Consortium in 2000. It aims to establish a database that is suitable for various species, defines and describes the functions of genes and proteins, and can continue to evolve with research. In-depth and updated semantic vocabulary standards. GO annotation covers three aspects: molecular biological function ( MF ) , biological process ( BP) and cellular components ( CC ). Through these three functions A large category defines and describes the function of a gene in many aspects. GO annotation is one of the most widely used gene annotation systems currently. The GO knowledge base is the world's largest source of information about gene function. This knowledge is both human-readable and machine-readable, and is the basis for computational analysis of large-scale molecular biology and genetics experiments in biomedical research.

Molecular function (MF):

     It describes the function or functions of genes at the molecular biology level, such as catalytic activity, transport activity, binding activity, etc. Molecular function mostly refers to the function of a single gene product, and a small part refers to the function of the complex formed by this gene product.

Biological process (BP):

      It describes the biological processes in which the gene participates, such as participation in transcriptional regulation, rRNA processing, DNA replication, cell growth and maintenance, signal transduction, and transportation of various factors. A biological process is a process that consists of molecular functions in an orderly manner and has multiple steps. A biological pathway is not exactly the same as a biological pathway. Therefore, GO does not involve the complex mechanism regulation process in the pathway.

Cellular component (CC):

     Describe the location of a gene (product) in a cell, whether in the cytoplasm, nucleus, organelles, mitochondrial membrane or matrix. Or in some gene products, such as proteasome, etc.

GENE ONTOLOGY RESOURCE:http://geneontology.org/

KEGG pathway enrichment

    Kyoto encyclopedia of genes and genomes (KEGG) is a database for systematic analysis of gene function and genome information. It integrates data information from genomics, biochemistry and system functional omics, including metabolic pathways ( KEGG ) . PATHWAY), drugs (KEGG DRUG), diseases (KEGG DISEASE), functional models (KEGG MODULE), gene sequences (KEGG GENES) and genomes (KEGG GENOME), etc. The KO (KEGG ORTHOLOG) system links various KEGG annotation systems together. KEGG has established a complete KO annotation system that can complete functional annotation of the genome or transcriptome of newly sequenced species. KEGG helps researchers study genes and expression information as a whole.

KEGG:https://www.kegg.jp/

GSEA enrichment analysis:

Gene Set Enrichment Analysis (GSEA) usually analyzes whether       a group of genes is over-presented at a certain functional node compared to random levels . GSEA analysis includes all genes and can take into account some weak but non-significant effects . GSEA analysis does not require differential analysis, and can directly use expression information to find pathways/functional gene sets related to traits . In this way, some key information can be retained without filtering, and then functional genes with no obvious differences but consistent gene difference trends can be found. set.

      In the next few issues, I will mainly introduce the use of DAVID online analysis tools , R clusterProfiler packages , etc., to conduct GO and KEGG functional enrichment analysis and corresponding visualization of genes.

DAVID online analysis tool performs GO/KEGG functional enrichment analysis of genes

Step 1-2

First open the DAVID official website: DAVID Functional Annotation Bioinformatics Microarray Analysis and click "Function Annotation".

Step 3

Import data: (1) Paste directly into "Paste a list"; (2) Directly import files in "Choose From a File", supporting txt format.

Step 4

 Select your gene type in "Select Identifier". I uploaded the gene name (Gene Symbol), so I selected "OFFICIAL_GENE_SYMBOL". (This step mainly depends on the type of data you import)

Step 5

Select the species you are studying in "Select species". I am the one studying here, so I chose "Homo sapiens".

Step 6

Select the type of list to be entered in "List Type". What I entered here is the gene under study, so I selected "Gene List".

Step 7

Click "Submit List" to run

 Step 8

View data enrichment analysis results

Step 9

Export enrichment analysis results (copy and paste into Excel)

 The enrichment results obtained by DAVID mainly consist of these columns of data: Category, Term (GO semantics), Count (number of genes), % (gene proportion), P-Value (P value), Genes (gene name), List Total , Pop Hits, Pop Total, Fold Enrichment, Bonferroni ( multiple testing correction ), Benjamini ( multiple testing correction ) and FDR (corrected P value).

 references

[1] Sherman, B. T. et al. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 50, W216-221, doi:10.1093/nar/gkac194 (2022).

Well, this sharing ends here. In the next issue, we will share the method of visualizing these functional enrichment results, so stay tuned.

 

Follow the public account of "Senior Xiao Pan Playing Doudou" to get more useful information.

Guess you like

Origin blog.csdn.net/weixin_54004950/article/details/128397133