Transcriptome technology applications (applications in biology, medicine, agronomy)

 Technical applications of transcriptome 

Preface. The application

of transcriptome technology in biology, medicine, and agriculture. With the rapid development of second-generation sequencing technology, its high-throughput, fast, and low-cost characteristics have become more and more biological researchers The first choice when there is a problem, especially in transcriptome sequencing, it shows great potential. Transcriptome (transcriptome) refers to the sum of all gene transcription products of a specific organism in a certain state. Transcriptome research is an important part of functional genomics research. The transcriptome is the inevitable link that connects the genetic information of the genome and the biological function (proteome). At the same time, compared with eukaryotic whole genome sequencing, the sequence obtained by transcriptome sequencing does not contain introns and other non-coding sequences, so the transcriptome Sequencing has an unparalleled cost-effective advantage. To study the complexity of genome structure and the fundamental laws of genetic language, it is more necessary to accurately and comprehensively reveal and analyze the massive amounts of data obtained from sequencing. Therefore, bioinformatics has become a rapidly emerging interdisciplinary subject. It is located in biology, computer, At the intersection of multiple fields such as mathematics, we continue to explore the biological meaning behind base sequence data. The current transcriptome sequencing and analysis technology can solve various problems such as the in-depth discovery of new genes, the discovery of low-abundance transcripts, the mapping of transcription maps, the regulation of alternative splicing, the determination of metabolic pathways, the identification of gene families, and evolutionary analysis. Transcriptome research is the basis and starting point of gene function and structure research, and has been widely used in many fields such as biology, medicine, and agriculture.



Chapter 1 The Application of

Transcriptome Technology in Biology What biological problems can high-throughput transcriptome sequencing technology solve for you?

Quickly obtain the types and abundance of mRNA in the cells, tissues or organisms you are interested in, and help you discover new mRNA isoforms produced by alternative splicing or alternative polyadenylation site selection; quickly obtain your interest Analyze the differential expression information of mRNA in different cells or different tissues of mRNA types and their abundance. Through functional analysis of differentially expressed genes, you can discover the overall characteristics of the changes in gene expression regulation during cell differentiation, especially embryonic stem cell and neural stem cell differentiation, body development, signal transduction and other biological processes; if you want to study how a certain gene is passed Change the cell's gene expression regulatory network to perform its biological functions. You can mutate, knock out or knock down the gene, and then analyze the RNA-seq in the cells of the control group and the experimental group through differential expression analysis. Get all the information you need.



Case 1. Expression differences between different cells

Marc Sultan, et al. A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome. Science 2008, 321: 956-960.


So far, the functional complexity of the human transcriptome has not been fully elucidated. Researchers such as Marc Sultan performed transcriptome high-throughput sequencing on HEK cell lines and B cell lines. After bioinformatics analysis, 50% of the sequences were aligned to a unique genomic position, 80% of which were identical to known exons. Matched, 66% of transcripts with polyA tails were aligned to known genes, and 34% were not annotated to specific genomic locations. Among known transcripts, high-throughput transcriptome sequencing has detected 25% more data than microarrays. In addition, this study conducted a global study of mRNA splicing events and found 94241 splicing methods (4096 of which were previously unidentified), among which exon skipping is the most important form of alternative splicing. Analyzing the differential expression of genes, it is found that 55 genes are overexpressed in lymphocytes, which are enriched in the Ras signal transduction pathway and the immune system; 271 genes are highly active in B cell lines, mainly MHC class II Related receptors and CD38, LCK (lymphocyte-specific protein tyrosine kinase), ZAP70 (zeta chain–associated proteinkinase 70 kD), CD19, and BLK (B lymphoid tyrosine kinase) signaling pathway related factors; 2669 HEK cell-specific genes, The first 1000 genes that are highly expressed are involved in the process of DNA anchoring and the binding of the cytoskeleton to the extracellular matrix.



Transcriptome high-throughput sequencing makes it possible to directly and comprehensively explore the complexity and dynamics of the human transcriptome. The comparative study of intracellular and intercellular alternative splicing and the simultaneous analysis of gene expression are unprecedented, and its research results far exceed the existing mammalian genome annotation maps.



Case 2. Transcriptome analysis of a single cell

Fuchou Tang, Catalin Barbacioru, Yangzhou Wang, et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods, 2009, 5:377-382.



High-throughput transcriptome sequencing methods have been developed in the past, but require a considerable amount of RNA, which requires collection from multiple cells, and this combined analysis cannot reflect events that occur in a single cell. Researchers solve this problem by selectively amplifying RNA with polyA tails-this method picks out some non-coding RNAs as well as mRNA. They also slightly adjusted the method of constructing an RNA library to improve its reproducibility, before proceeding with RNA sequencing and positioning the data in the transcript of the mouse RefSeq database.



The researchers only tested and analyzed the mRNA-Seq of one mouse blastomere cell. Through at least 5 reads, they found 75% (5270) more expressed genes than microarray technology, and identified 1,753 previously unknown genes. Cut site. In addition, 8-19% of expressed genes in the same blastomere or oocyte have at least two different transcripts, which clearly indicates the splicing isoforms of a single cell on a genome-wide scale. Complexity. Finally, in mouse oocytes lacking Dicer and Ago2 (Eif2c2), compared with wild-type oocytes, 1696 and 1553 genes were found to be abnormally up-regulated, of which 619 genes were up-regulated in both mutants. ; The knockout of Dicer and Ago2 led to the decrease of expression of 1571 and 1121 genes, respectively, of which 589 genes were down-regulated in both mutants. This single-cell mRNA-Seq detection will greatly improve our ability to analyze the transcriptional complexity of single cells in mammalian development, especially early embryonic development and stem cells, which are rare cell populations in the body.



Case 3. Transcriptome analysis of mammalian tissue

Ali Mottazavi, Williams BA, McCue K, et al. Mapping and quantifying mammalian transcriptomes byRNA-seq. Nat Methods, 2008,5(7):621-8.


The hugeness and complexity of mammalian genomes pose severe challenges to its transcriptome research. Mortazavi and other researchers combined with the Illumina sequencing platform to perform transcriptome high-throughput sequencing and analysis on the RNA of adult mouse brain, lung, and skeletal muscle tissues, and obtained 140 million data. About 90% of the positions match the known exons. At the same time, unreported sequence information was also found. About 3000 newly identified 3'UTRs may play an important role in microRNA-mediated post-transcriptional and translational level regulation; about 3000 newly identified 5'exons, suggesting that new promoter sequences are used. Especially in terms of RNA splicing, the study used the massive data obtained by high-throughput sequencing to compare to a database of about 2×105 possible splicing methods, and identified 1.45×105 different splicing methods, among which alternative splicing was dominant, 3,500 Each gene has at least one internal splicing method.



The research results show that high-throughput sequencing technology can not only detect low-abundance transcripts, but also discover unknown transcripts, accurately identify alternative splicing sites, and provide comprehensive transcriptome information. These are all microarray hybridization techniques or SAGE The library sequencing technology is incomparable and is a powerful tool for in-depth study of the complexity of the transcriptome.


Case 4. Differences in expression between species

Augix Guohua Xu, Liu He, Zhongshan Li, et al. Intergenic and Repeat Transcription in Human, Chimpanzee and Macaque Brains Measured by RNA-Seq. PLoS Comput Biol 2010; 6(7): e1000843.


Among human tissues and organs, the brain transcriptome is one of the most complex tissues and organs. Researchers such as Augix Guohua Xu used high-throughput sequencing technology to study the gene expression levels of the brains of humans, chimpanzees and rhesus monkeys at different ages. In the three species, only 20–28% of transcripts can be aligned to the annotated exon region, and 20–23% of the transcripts can be aligned to introns, from the repetitive sequences of introns and intergenic regions It accounts for 40–48% of the brain transcriptome. Some repeat families show increased transcript copy number. In the non-repetitive intergenic regions, the researchers identified 1093 regions that are significantly highly expressed in the human brain. These regions are conserved at both the primate RNA expression level and the mammalian DNA sequence level. 20% of transcripts have extensions in the 3'UTR of known genes, which may play a role in variable microRNA regulation of gene expression. Finally, the researchers found that the transcriptome expression differences between species gradually increased with the increase of evolutionary time. Compared with exons, the transcripts of the intergenic region showed greater differential expression. The results of this study show that high-throughput sequencing technology found that there are many unidentified evolutionary conserved transcripts in the human brain, and some of these transcripts may play a role in the regulation of transcription levels and the evolution of human-specific phenotypic characteristics. It provides an effective method to reveal the mechanism of brain development at the molecular level.



Chapter 2 The Application of

Transcriptome Technology in Medicine What medical problems can high-throughput transcriptome sequencing technology solve for you?

During the occurrence and development of cancer and other complex diseases, the gene expression patterns in cells will change significantly. If you are a clinician or a scientist engaged in related research, you want to quickly and comprehensively grasp the changes in gene expression patterns in the occurrence of cancer or other diseases of interest, and provide important solutions for the diagnosis and treatment of the disease; then, RNA-seq can By comparing genes whose expression patterns have changed significantly in normal samples and disease samples, and analyzing their functions, we can quickly provide you with the correct answers. When bacteria and viruses are infected, the gene expression patterns in cells will also change significantly. These changes are critical to the body's anti-infection function. If you are a doctor or scientist engaged in related research and want to quickly and comprehensively grasp the characteristics of changes in cell gene expression patterns during a virus or bacterial infection process, and provide an important solution strategy for effectively resisting pathogen infection; then RNA-seq can be compared Genes whose expression patterns have changed significantly in normal samples and infected samples, and their functional analysis will provide you with the correct answer.



Case 1. Gene fusion in cancer cells and tissues

Maher CA, Sinhal CK, Cao XH, et al. Transcriptomesequencing to detect gene fusions in cancer. Nature, 2009, 458: 97-101.



Christopher of the University of Michigan School of Medicine in 2009 Researchers such as A. Maher used transcriptome high-throughput sequencing technology to sequence and analyze cancer cells in order to find new gene fusions. The study successfully "rediscovered" the BCR-ABL1 10 gene fusion in chronic myelogenous leukemia cells, and the TMPRSS2-ERG2 gene fusion in prostate cancer cells and prostate cancer tissues. In addition, the researchers also verified a new gene fusion (SLC45A3-ELK4) that leads to chimeric transcription in cancer cells and tumor tissues.

Characterizing specific genomic abnormalities in cancer cells plays an important role in determining the therapeutic goals of cancer. Therefore, determining the genetic abnormalities that induce cancer is a major method in cancer research. Gene fusion caused by the rearrangement of chromosomes in cancer cells is considered to be the main cause of some of the most common "cancer genes". Due to their inducing effects in the carcinogenic process and the precise limitations of cancer cells, fusion genes can describe ideal diagnostic markers and reasonable therapeutic targets. Periodic gene fusion is closely related to hematological malignancies, rare bone tumors and soft tissue tumors. Recently, its role in some common solid tumors has been discovered, such as prostate cancer and lung cancer. Christopher A. Maher et al. confirmed that transcriptome sequencing is a very effective tool for detecting gene fusion through transcriptome sequencing of different cell lines and subsequent verification of qRT-PCR, FISH, Array CGH or high-density SNP Array. In addition, in the study of transcriptome sequencing by Christopher A. Maher and others with Illumina GenomeAnalyzer, in order to eliminate false positive data, overcome the lack of depth of long reads and reduce the difficulty of short reads in local gene positioning and arrangement, long reads The sequence data of short reads and short reads are merged together for analysis. The results prove that this integrated processing method is extremely effective in reducing false candidate genes and greatly increasing the ratio of feasible candidate genes in the experiment.



An important limitation is that when two adjacent genes only cause the fusion of regulatory sequences rather than transcription sequences, transcriptome sequencing cannot be used. In any case, the study established a reliable method route for discovering new gene fusions based on transcriptome high-throughput sequencing technology, which opened up an important way for the systematic definition of cancer-related mutations.



Case 2. Research on pathogenesis

Vincent M. Bruno, Zhong Wang, Sadie L. Marjani, et al. Comprehensive annotation of the transcriptome of the human fungal pathogen Candida albicans using RAN-saq. Genome Res, 2010, 20: 1451-1458.



Candida albicans is an infection The main human fungal pathogens expand the scope of diseases through surface mucosal infections, and can also be spread through blood, causing systemic infections and causing many diseases, usually life-threatening. The current prevalence is increasing, and the resistance of existing antibacterial therapies is also increasing. Through the understanding of the molecular mechanism of pathogenesis and the collection of drug resistance, it is more promising to promote the establishment of new treatment options. A complete description of the transcriptional regulation of Candida albicans is essential for a comprehensive understanding of the pathogenesis.



The ability of Candida albicans to cause disease depends to a large extent on the stress response of changing its transcriptome level after being stimulated by different environments to ensure its own survival in different host environments. The researchers performed high-throughput sequencing of the Candida albicans transcriptome in 9 different environments, quantitatively covering all the tested areas, constructed a high-resolution map of the Candida albicans transcriptome under different conditions, and identified 602 new ones. Transcriptionally active regions, and many introns that are not currently labeled in the genome. Interestingly, the expression of these transcriptionally active regions is regulated by specific environments. The researchers performed cluster analysis and functional enrichment analysis on the obtained data, and verified 41 genes (including 26 new transcripts and 15 annotated transcripts) using real-time fluorescent quantitative PCR. This comprehensive transcription analysis method not only significantly increases the annotation of the existing genome information of Candida albicans, but also provides a necessary framework for a more comprehensive understanding of the molecular mechanism of this important eukaryotic pathogen.



Case 3. Changes in cell gene expression patterns during virus infection

Z. Yang, DP Bruno, CA Martens, et al, Simultaneous high-resolution analysis of vaccinia virus and host cell transcriptomes by deepRNA sequencing. Proc Natl Acad Sci USA, 2010,107: 11513–11518.



In addition to virus genes during virus infection The activation is usually accompanied by disorder of cell gene expression. The development of new DNA sequencing technologies requires simultaneous and high-resolution analysis of virus and cellular mRNAs from infected cells. At the same time, the dense overlap of genes and the downstream open reading frames (ORFs) from the beginning to the end of the mRNA produced make the analysis of the virus transcriptome more complicated. High-resolution deep RNA sequencing can identify some partially overlapping open reading frames (ORFs) due to transcription products read from beginning to end.



Researchers such as Zhilong Yang used RNA deep sequencing technology to simultaneously study the transcriptomes of VACA virus and HeLa cells at different times after infection with vaccinia virus (VACV). A total of about 500 million short cDNA sequences were obtained by sequencing, and a complete VACV transcriptome and more than 14,000 host mRNAs at different infection times were constructed. Before viral DNA replication, 118 open reading frame (ORFs) transcripts were detected; after replication, another 93 open reading frame transcripts were detected. High-throughput sequencing technology allows the boundaries of many mRNAs to be accurately defined. According to the analysis of infection time, like DNA synthesis, two clusters of early mRNA are synthesized in the presence of protein inhibitors, suggesting that they are different from other DNA viruses, regardless of the early and late development stages. At 4h, virus-encoded mRNA accounts for 25-55% of the total transcribed RNA. This rapid change causes a large amount of host mRNA content to decrease, which in turn leads to a sharp decline in host protein synthesis and loss of antiviral ability. But at 2h, there was a small increase in host mRNA. These up-regulated RNAs are NF-κBcascade. Apoptosis, signal transduction, and ligand-mediated signaling factors seem to be involved in the host's response to viral infection.



In short, this study proves that the method of deep RNA sequencing to analyze the interaction between the virus and the host and the transcriptome of the virus is more complete and comprehensive than previous methods. The application of RNA deep sequencing technology to the research of infecting different types of cells (including dormant cells) with cytolytic viruses or non-cytolytic viruses will have great scientific research space and value.



Case 4. Tissue transcriptome sequencing data revealed a large number of housekeeping genes

Ramsköld D, Wang ET, Burge CB, Sandberg R. An Abundance of Ubiquitously Expressed Genes Revealed by Tissue Transcriptome Sequence Data. PLoS ComputBiol, 2009, 5(12): e1000598 . doi:10.1371/journal.pcbi.1000598.



In molecular biology research, a basic question is how different cells and tissues differ in gene expression, and how these differences indicate different biological functions. In a tissue or cell, the transcribed part of the genome can reflect the biological processes and functions performed by the tissue or cell, such as which part of the cell function represents the housekeeping function required by all cells, and how many genes encode this function. The transcriptome of mammalian tissues can be studied by a variety of methods, such as rejoining dynamics (Rot), continuous analysis of genome expression (SAGE), biochips, ESTs, deep RNA sequencing, etc. Among them, deep RNA sequencing (RNA-Seq) is more effective than biochips due to its high sensitivity and high reproducibility. It can detect genes with low expression and different expression, and the correlation between expression value and protein level is also better.



Researchers such as Ramsköld D have characterized the characteristics of mammalian tissue transcriptomes by analyzing the data of deep RNA sequencing (RNA-Seq) of human or mouse tissues and cell lines respectively at the genetic level. A total of about 8,000 protein-coding genes have been found to be ubiquitously expressed, and in most tissues, they affect about 75% of mRNA through the number of information copies. These mRNAs usually encode proteins in the cell and are mainly involved in metabolism, transcription, some RNA processes or translation. In contrast, genes related to secretion or plasma membrane proteins are mainly expressed in a tissue subtype. The expression levels of genes are widely distributed and continuous, and there is currently no data to support the concept of significant gene expression types. Expression assays that include only read genes mapped to coding exons can better correlate with qRT-PCR data than expression assays that also include 3'untranslated regions (UTRs). Muscle and liver tissues contain the least complex transcriptomes, only the main housekeeping genes are expressed and a large number of transcripts are derived from only a small number of highly expressed genes; the transcriptomes expressed in brain, kidney and testis are more complex. And most genes are expressed, and the contribution of highly expressed genes is relatively small. The mRNA expressed in the brain usually has a long 3'UTR, and its average length is longer than genes involved in growth, morphogenesis, and single conduction, which increases the complexity of these genes based on UTR regulation.



Since no candidate genes for housekeeping genes are valuable in determining the determinants of cell recognition and response, when samples are aggregated by gene expression, these housekeeping genes can be isolated or discarded by UTR ascertaining them. At the same time, since housekeeping genes are rarely involved in genetic diseases, isolating or discarding these genes can also simplify the screening of disease candidate genes in the study of genetic linkage or genetic related disease history. Therefore, the characteristics of housekeeping genes found in the 3'UTR length in this study play a significant role in characterizing housekeeping genes and simplifying the study of certain gene expression.




Chapter 3 Application of

Transcriptome Technology in Agriculture What agricultural problems can the high-throughput transcriptome sequencing technology solve for you?

Significant changes occur in the gene expression pattern of cells during the normal growth of plants, drought resistance, stress resistance, and the cultivation of superior strains. If you are a scientist or agronomist engaged in agricultural research, RNA-seq can quickly and comprehensively grasp the important functions of the plant traits you are interested in by comparing genes with significant differences in expression patterns between normal samples and your interested samples Genes give you a boost in the process of breeding or related agricultural application research.



Case 1. Analysis of rice transcriptome

Guojie Zhang, Guangwu Guo, Xueda Hu, et al. Deep RNAsequencing at single base-pair resolution reveals high complexity of the ricetranscriptome. Genome Res, 2010, 20: 646-654.

In eukaryotes , Mastering the dynamic changes of the transcriptome is essential for studying the complexity of transcriptional regulation and the influence of transcriptional regulation on phenotype. So far, little has been involved in the analysis of single base differences in the transcriptome, especially in plants. In the past few decades, rice has been extensively studied for its agricultural value. Zhang and other researchers used RNA high-throughput two-way deep sequencing technology to show for the first time the complete picture of the transcript expression profile of the eight tissue parts of rice, and can accurately detect very low abundance transcripts, and obtain a considerable number of new transcripts , Exons, non-coding regions. The basic analysis and in-depth analysis of bioinformatics revealed that cis-splicing accounts for more than 33% of rice genes. 234 speculated chimeric transcripts may be produced by trans-splicing. A large number of fusion transcripts may be by-products of alternative splicing. The amount of data and information is much higher than the previous reports.



One of the most interesting findings of the study is that the rice transcriptome contains a large number of chimeric transcripts, indicating that transcript fusion is more common than previously expected. Detailed sequence analysis of CTs indicated that trans-splicing may be an important mechanism for the formation of fusion transcription. We also found that many open reading frames containing these fusion transcripts pooled specific protein domains from different genes, suggesting that proteins with new interaction modes may be formed. Although more analysis is needed to evaluate the true function of a fusion transcript in rice, this study provides a new method for the post-transcriptional fusion of different genes in rice to produce more complex exon homologous recombination.

High-throughput sequencing technology has greatly enriched rice transcriptome information, helping researchers to understand the diversity and complexity of transcripts more comprehensively, and expanding the field of future agricultural research.



Case 2. Studies on

alternative splicing in Arabidopsis Sergei A. Filichkin, Henry D. Priest, Scott A. Givan, etal. Genome-wide mapping of alternative splicing in Arabidopsis thaliana. GenomeRes, 2010, 20:45-58.



SergeiA. Researchers such as Filichkin used transcriptome high-throughput sequencing technology to perform alternative splicing analysis on Arabidopsis thaliana and found that more than 42% of genes with introns have alternative splicing forms. This data is much higher than the EST sequencing method (20% -30%). Most alternatively spliced ​​transcripts have premature stop codons (PTC+). PTC+ can be used as a target for nonsense-mediated mRNA degradation monitoring mechanism (NMD), or regulate functional transcripts by regulating unintended splicing and transcription mechanisms (RUST) Level. The study also found that under the stress of different environmental factors, the relative ratio of PTC+ and related splice variants will change accordingly. The research results also suggest that, similar to animals, NMD and RUST also exist widely in gene expression in plants and play very important roles.


Case 3: Transcriptome analysis of tea

Shi CY, Yang H, Wei CL, Yu O, Zhang ZZ, Jiang CJ, Sun J, Li YY, Chen Q, Xia T, Wan XC. Deep sequencing of the Camellia sinensis transcriptomerevealed candidate genes for major metabolic pathways of tea-specificcompounds . BMC Genomics. 2011 Feb 28;12:131.


Tea contains a lot of secondary metabolites such as polyphenols, theanine, essential oils, etc., making tea beverages a beverage that benefits human health. At the same time, tea is also an economically important horticultural crop and a model organism for studying self-incompatibility. However, due to the large genome of this species (4GB) and limited genetic research tools (such as tissue culture and transformation), the amount of available genome information is small, and there are major limitations in the scientific research of tea. Most of the genetic information of tea previously obtained uses EST sequencing technology. However, EST sequencing has great limitations. The shortcomings of low throughput, high cost, and inability to quantitatively express genes make scientific researchers turn their attention to transformative RNA -seq technology.


Shi and other researchers used Illumina high-throughput technology to obtain C. sinensis transcriptome information. Through unprecedented depth (2.59 gigabase pairs) analysis, approximately 34.5 million data was obtained. After sorting and combination, 127094 unigenes were obtained, with an average length of 355bp. The data is about 10 times that currently in the GenBank database. Comparing with six public databases for sequence similarity analysis, it was found that 55,088 unigenes could not be annotated by gene description, conserved protein domains or gene ontology terms. Some unigenes are related to hypothetical metabolic pathways. Through target search, it is found that most genes are involved in a variety of primary metabolic pathways and natural product pathways related to tea quality, such as the synthesis pathways of flavonoids, theanine and caffeine. At the same time, some new candidate genes involved in secondary metabolic pathways have also been discovered. The large amount of data obtained in this study is not available in the previous database, and 13 genes related to the synthesis of theanine and flavonoids have been identified. Their expression patterns in different tissues of tea are determined by RT -PCR and real-time quantitative PCR analysis.


The tea transcriptome information obtained by this study using transcriptome high-throughput sequencing technology will become an important information public platform for the study of gene expression and functional genomics of this species, providing a large amount of information reference for subsequent research. At the same time, research on improving the quality of tea provides an important basis and makes an important contribution to human health.


Case 4.

Wenliang Wei, Xiaoqiong Qi, et al. Characterization of the sesame (Sesamum indicum L.) global transcriptome using Illumina paired-endsequencing and development of EST-SSR markers. BMC Genomics, 2011, 12:451



Sesame, as an important oil crop, the molecular mechanism of fatty acid biosynthesis and metabolism is particularly important. However, the current transcriptome and genome data are not enough to support its continued research. In addition, fewer molecular markers (EST-SSR) also affect the efficiency and accuracy of sesame genetic breeding. Therefore, the acquisition of high-throughput transcriptome data is a crucial step to promote its research.



Researchers such as Wenliang Wei used Illumina's paired-end sequencing technology to sequence the transcriptome of five series of Sesame. The 86,222 unigenes obtained by sequencing analysis have an average length of 629bp. Among them, 54.03% found annotations on NCBI and SwissProt, and 25.52% drew 119 paths on KEGG (Kyoto Encyclopedia of Genes and Genomes Pathway database). 44,750 unigenes are homologous with 15,460 genes in Arabidopsis. In addition, 7,702 SSRs (EST-SSR) were newly discovered, and 50 EST-SSRs were randomly selected for amplification verification, and the degree of diversity of genomic DNA enrichment was determined. It was found that 40 pairs of primers successfully amplified DNA fragments, and there were important polymorphisms in 24 positions.



This research shows that the fast and efficient Illumina's paired-end sequencing technology has played a key role in the development and research of non-model organism gene discovery and molecular markers, and provided a large amount of data resources for future research.


Appendix Sequenced Species

Sequenced organisms refer to organisms whose genome has been completely sequenced. The DNA sequences of some organisms have been fully annotated, and functional fragments (such as genes, etc.) have been mapped.



Animal

ü Homo sapiens-human

ü Pan troglodytes-chimpanzee

ü Mus musculus-mouse (model organism)

ü Rattus norvegicus-rat

ü Zebrafish (model organism)

ü Drosophila melanogaster-Drosophila melanogaster (model organism)

ü Caenorhabditis elegans-Caenorhabditis elegans (model organism)

ü Caenorhabditis briggsae-a type of nematode

ü Gallus gallus-chicken

ü Bos taurus cattle- Cattle

ü Bubalus bubalis-Buffalo

ü Canis lupus familiaris dog-Dog

ü Felis catus-Cat

ü Platypus-Platypus

ü Fugu rubripes-Puffer

ü Apis mellifera-Bee

ü Anopheles gambiae-Anopheles

ü Panda-Panda



plant

ü Arabidopsis thaliana-Arabidopsis thaliana-Arabidopsis thaliana (Model organism)

ü Guillardia theta-a cryptophyte

ü Oryza sativa-rice

ü Glycine max-soybean



fungus

ü Ashbya gossypii- fungus that infects citrus and cotton

ü Aspergillus fumigatus-Aspergillus fumigatus, human pathogen

ü Aspergillus nidulans-structure Aspergillus nidulans

ü Debaryomyces hansenii-salt-tolerant yeast

ü Encephalitozoon cuniculi-a unicellular microsporidian

ü Kluyveromyces lactis-yeast with potential for medicinal production

ü Kluyveromyces waltii-a yeast

ü Magnaporthe grisea-rice blast fungus

ü Neurospora crassa-rough Neurospora, orange bread mold, model organism

ü Phanerochaete chrysosporium-White Rot Pathogen

ü Saccharomyces cerevisiae-Saccharomyces cerevisiae (model organism)

ü Schizosaccharomyces pombe-Schizosaccharomyces pombe

ü Yarrowia lipolytica-a kind of yeast

ü Candida albicans-white Candida, human pathogens




protists

ü Leishmania infantum - one kind of Leishmania

ü Leishmania major - one kind of Leishmania

ü Plasmodium falciparum - Plasmodium falciparum

ü Plasmodium yoelii yoelii - with Plasmodium, causing malaria in rodents

ü Thalassiosira pseudonana-a kind of marine diatom

ü Trypanosoma brucei-a kind of trypanosoma

ü Trypanosoma congolense-a kind of

Trypanosoma ü Trypanosoma cruci-a kind of

Trypanosoma ü Trypanosoma vivax-a kind of Trypanosoma

Disclaimer: The above information comes from the website of Wuhan Life Beauty Technology Co., Ltd.-Transcriptome Collection http:/ /www.ablife.cc/book/bookContentAction?id=38

Guess you like

Origin blog.csdn.net/u010608296/article/details/113123560