【Lecture】 Zhan Xianquan——Proteoforms in tumor

The concept of Proteoforms / Protein species is quite meaningful.

The sequencing of the human structural genome is nearing its end. People have turned from structural genomics research to functional genomics research, that is , research on transcriptome and proteome. The concepts of "proteomics" and "proteomics" were formally proposed in 1995 and have a history of 25 years.

The main technologies of proteomics include proteome separation technology , identification technology and proteomics information technology .

The separation techniques of proteome mainly include two-dimensional gel electrophoresis (2DE) and multi-dimensional liquid chromatography (2DLC) .

Proteome identification technology is mainly based on mass spectrometry (MS) technology, which is mainly divided into peptide mass fingerprint (PMF) and tandem mass spectrometry (MS / MS) analysis technology. The two major ion sources used for protein macromolecule analysis are mainly MALDI And ESI . Mass spectrometry technology has developed rapidly, mainly towards high sensitivity and high throughput .

Proteomics informatics technology is mainly related technology used to construct protein interaction network .

Proteome different separation techniques and mass spectrometry combined to form various types of proteomic analysis techniques : The 2DE-MS and 2DLC-MS. 2DE-MS also has 2DE-MALDI-PMF and 2DE-ESI-LC-MS / MS. This technology is the main technology in the first 10-15 years of proteomics research. However, conventional concepts believe that the flux of 2DE is not high . That is, a 2D glue dot generally contains only 1-2 proteins. Usually, the flux can only identify dozens to one thousand proteins in one experiment, so that its position in proteomics is gradually diluted. 2DLC-MS mainly includes iTRAQ or TMT-based SCX-LC-MS / MS and label-free LC-LC-MS / MS, which is what people usually call "Bottom-up" proteomics . For 15 years, it has played the role of core technology in proteomics . Because its flux has increased significantly, proteins with a flux of thousands to 10,000 can be identified in one experiment, but the result of this method is a protein group.  In essence, the genes that encode proteins are identified , but no proteins are identified in the true sense, that is, protein forms (Proteoforms or Protein species).

Proteoforms are the basic unit of the proteome. There are about 20,000 human genes and at least 100,000 human transcripts. Each transcript instructs the ribosome to determine an amino acid residue according to the triple codon to synthesize the amino acid sequence. The newly synthesized protein amino acid sequence has no function, it must be Only when it reaches its designated location such as intracellular, extracellular, and different subcellular organelles, etc., it forms a specific three-dimensional spatial structure and interacts with related molecules around it to form a complex to play its functional role. There are many post-translational modifications of proteins (PTMs; it is estimated that there are 400-600 PTMs in the human body) during the process from the ribosome just synthesized to its designated position. In this way, there are at least 1 million or even 1 billion proteins (Proteoforms) in the human proteome (Figure 1).

image

Figure 1: Proteoforms concept and formation model (Zhan et al, Med One, 2018; Zhan et al., Proteomes, 2019)

For such a huge number of Proteoforms / Protein species,  how to detect, identify and quantify them on a large scale is a crucial matter. At present, there are two strategies for the research on Proteoforms: one is the "Top-down" MS technology , and the other is the combined technology of "Top-down" and "Bottom-up", that is, the 2DE-LC / MS technology (Figure 2).

image

Figure 2: Comparison of Proteoforms research techniques (Zhan et al., Med One, 2018; Zhan et al., Proteomes, 2019)

The “Top-down” MS technology can detect, identify and quantify Proteoforms to obtain the amino acid sequence and PTMs information of the protein. However, the throughput of this technology is low . At present, the maximum throughput has identified 5700 Proteoforms, corresponding to 860 proteins.

Recently, Professor Zhan Xianquan's team discovered that 2DE-LC / MS technology is an ultra-high-throughput technology platform that can identify hundreds of thousands to one million Proteoforms in terms of detection, identification and quantitative Proteoforms. With the significant increase in the sensitivity of mass spectrometry, since 2015, the team of Professor Zhan Xianquan found that each 2D dot contains an average of at least 50 or even hundreds of Proteoforms, and most of them are low-abundance; In the past two years, related papers have been published to fully explain the new concepts and practices of 2DE-LC / MS, completely breaking the traditional understanding of two-dimensional electrophoresis (that is, a 2D dot generally contains only 1-2 proteins) for more than 40 years . Large-scale Proteoforms research provides the technical basis.

The development of the concept of Proteoforms / Protein species has greatly enriched the connotation of proteomics, is a higher level of proteomics research, and is the frontier of international scientific development. Helps to find reliable and effective disease markers , for deep understanding of disease molecular mechanism and determination of drug targets, or for effective prediction, diagnosis, and prognosis evaluation.

In addition, the proteome is an important component of the phenotype, the ultimate executor of genomic functions, and cannot be replaced by the Institute of Genomics and Transcriptome. To achieve true personalized medicine and precision medicine, proteomics research cannot be bypassed of.

Guess you like

Origin www.cnblogs.com/jessepeng/p/12732591.html