Full process analysis of m6A RNA methylation MeRIP-seq sequencing analysis experiment

How to perform the methylated RNA immunoprecipitation (MeRIP-seq/m6A-seq) experiment is introduced in detail from four aspects: technical principles, library construction and sequencing process, information analysis process and research routines.

1. Principle of methylated RNA co-immunoprecipitation (MeRIP-seq/m6A-seq) sequencing technology

Epitranscriptome refers to the phenomenon that chemical modifications on RNA regulate gene expression without changing the RNA sequence. There are more than 100 types of modifications in intracellular RNA, and most of the epigenetic modifications occur on tRNA and other non-coding RNAs [1]. The modifications on mRNA are relatively small in terms of both type and quantity. The figure below shows some of the chemical modifications that occur on eukaryotic mRNA[2]:

Figure 1: Schematic diagram of mRNA methylation modification

The most abundant modification on mRNA is the methylation modification at the N6 position of adenylate (N6-methyladenosine, m6A). m6A is involved in all aspects of the mRNA life cycle [2]. m6A modification mainly occurs on motifs such as RRACH (R = G or A and H = A, C, or U). This type of motif is mainly enriched at the stop codon and 3'UTR, that is, m6A mainly occurs at the stop codon. sub and near the 3'UTR. m6A modification is catalyzed by m6A methyltransferases (writers), removed by m6A demethylases (erasers), and recognized and functional by m6A binding proteins (readers) [3]. m6A modification is widely found in various organisms, including viruses, higher animals such as yeast, plants and humans [1,3].

Research shows thatm6A’s most important known function is to regulate the stability of mRNA: m6A-modified mRNA in the cytoplasm can be recognized by YTHDF2, allowing it to be enriched in Processing body (P-body) thereby accelerating the degradation of mRNA. In addition, m6A modification can also change the secondary structure of RNA and regulate the target recognition of microRNA to regulate the stability of mRNA. In the nucleus, m6A modification can regulate RNA splicing and nuclear export processes, thereby regulating gene expression. m6A may also interact with DNA methylation. The figure below shows the relationship between some currently known m6A and biological functions:

Figure 2: m6A and biological functions

Early experimental methods have identified the presence of m6A in rRNA and mRNA, but due to limitations of experimental technology, there has been no major progress in the study of m6A. In recent years, the report of RNA demethylase FTO has brought the study of m6A modification into people's field of vision again. Subsequently, the establishment of the MeRIP-seq/m6A-seq experimental method provided the opportunity to conduct in-depth research on the distribution and function of this modification [1].

MeRIP-seq/m6A-seq technology was used to identify regions with m6A modifications across the genome. The principle is to co-immunoprecipitate RNA fragments with m6A modification in cells through antibodies that specifically recognize m6A modification. By performing high-throughput sequencing on the precipitated RNA fragments and combining it with bioinformatics analysis, the status of m6A modification can be systematically studied across the entire genome. MeRIP-seq/m6A-seq is currently one of the most widely used technologies for studying m6A modifications.

2. Methylated RNA co-immunoprecipitation (MeRIP-seq/m6A-seq) experimental process

Every step from sample processing to final data acquisition will have an impact on data quality and quantity, and data quality will directly affect the results of subsequent information analysis. In order to ensure the accuracy and reliability of sequencing data from the source, every step of sample processing, library construction, and sequencing must be strictly controlled to fundamentally ensure the output of high-quality data. The flow chart is as follows:

(1) Total RNA sample detection

The detection of RNA samples mainly includes three methods:

(1) Agarose gel electrophoresis analyzes the degree of RNA degradation and whether there is contamination, and detects an obvious 18S or 28S main band, and the band is clear;

(2) Qubit 2.0 accurately quantifies RNA concentration, and the total amount of total RNA detected is not less than 10ug;

(3) Agilent 2100 accurately detects the integrity of RNA, and the RIN value is not less than 7.5;

(2) Library construction

1. RNA fragmentation:

Take 10ug of total RNA, add RNA Fragmentation Reagents (Invitrogen), and react in a Thermomixer at 70°C for 10 minutes to break the RNA into fragments with a fragment size of about 100 nt. Use ethanol to precipitate the fragmented RNA;

2. m6A enrichment:

The magnetic beads containing protein A and protein G were washed with IP buffer (150 mM NaCl, 10 mM Tris-HCl pH 7.5), and then incubated with 5 ug m6A antibody (Millipore) for 2 hours at 4°C; washed twice with IP buffer, and then Resuspend the magnetic beads with IP buffer, add fragmented RNA, and invert at 4°C for 4 hours; wash the magnetic beads 3 times with IP buffer at 4°C, then use m6A competitive eluent, and incubate at 4°C for 1 hour. Collect the supernatant containing the eluted m6A RNA into a new test tube and purify with phenol:chloroform:isoamyl alcohol (125:24:1);

3. Library construction:

Use SMARTer® Stranded Total RNA-seq Kit v2 - Pico Input Mammalian User Manual according to the instructions to perform reverse transcription and library construction of IP and Input samples; use AMPure XP bead to perform Fragment size selection was performed to obtain the final library.

The construction schematic is as follows: use m6a antibodies to pull down m6a

Figure 3: Schematic diagram of enrichment process

(3) Library quality inspection

After the library construction is completed, first use Qubit2.0 for preliminary quantification, dilute the library to 1ng/ul, and then use Agilent 2100 to detect the insert size of the library. After the insert size meets expectations, use the qPCR method to accurately quantify the effective concentration of the library. (Library effective concentration > 2nM) to ensure library quality.

(4) On-machine sequencing

After the library passes the test, different libraries are pooled according to the effective concentration and target off-machine data volume requirements and then sequenced on the Illumina Nova platform. The sequencing strategy is PE150.

3. Methylated RNA co-immunoprecipitation (MeRIP-seq/m6A-seq) information analysis process

(1) Quality control of original offline data

The original offline data is in FASTQ format, which is the standard format for high-throughput sequencing. Every four lines in a FASTQ file is a unit and contains information about a sequencing sequence (read). The first line of the unit is the ID of the read, usually starting with the @ symbol; the second line is the sequencing sequence, which is the sequence of reads; the third line is usually a + sign, or the same information as the first line; the fourth line is The base quality value describes the accuracy of the bases in the second line of the sequence. One base corresponds to a base quality value, so the length of this line and the second line are the same. The following is an example of a read message:

Figure 4: Example of FASTQ format

The original offline data contains adapter sequences introduced during library construction and low-quality bases. These factors will lead to fewer reads that are subsequently compared to the genome, resulting in less information. Therefore, it is necessary to filter.

Use Trimmomatic software to perform quality control steps on the original data to remove adapter sequences and low-quality bases. The parameters used are: ILLUMINACLIP:MeRIP-PE.fa:2:30:10:1:true SLIDINGWINDOW:30:15 AVGQUAL:15 LEADING:15 TRAILING:15 MINLEN:30

(2) Data comparison

The filtered data needs to be compared to the reference genome. The m6A-modified segment on the genome will have a large number of reads to be compared, thus forming a "peak". Based on the position of the peak, you can determine which positions in the genome are methylated.

alignment uses hisat2[4]. This software can quickly align short sequences to the reference genome, and can consider and process splicing sites, especially suitable for alignment of RNA data. The parameters used are default parameters. After the comparison is completed, the result path is filtered to remove multiple comparisons, remove low-quality comparisons, etc., to obtain more accurate comparison results.

(3) Detection of m6A modified region

MeRIP-seq enriches the m6A-modified regions on RNA and then performs sequencing. Therefore, the number of reads covered by the IP library for m6A-modified regions will be significantly higher than input library, thereby forming a "peak". Detecting the positions of these peaks can determine the location on the RNA where m6A modification occurs.

After the m6A modification is identified, then analyze the m6A modification such as annotation, distribution statistics, and motif identification. m6A bioinformation analysis content

(4) Identification of differential m6A modified regions

The identification of differential m6A modifications (i.e. differential peaks) uses the R package exomePeak[5,6]. This R package first merges the peaks of the samples that need to be compared, then calculates the number of accumulated reads in the merged peak of each sample; then standardizes these reads to make the two groups of samples at a comparable level; and finally tests the two groups. Whether there is a significant difference in the number of reads between the samples within this peak. The parameters used are: WINDOW_WIDTH=200 SLIDING_STEP=30 DIFF_PEAK_ABS_FOLD_CHANGE = 1.5 FOLD_ENRICHMENT=1.5 FRAGMENT_LENGTH=200.

After identifying the differential m6A modifications, the differential m6A modifications can then be analyzed such as annotation, distribution statistics, and motif identification.

(5) Analysis of mRNA gene expression levels input is equivalent to background control

The input library of m6A-seq is equivalent to the RNA-seq library and can be used to analyze gene expression and identify differentially expressed genes. Gene expression levels are estimated by counting sequencing sequences (reads) that map to genomic regions or gene exon regions. In addition to being directly proportional to the true expression level of the gene, the read count is also positively related to the length and sequencing depth of the gene. In order to make the gene expression levels calculated between different genes and different experiments comparable, TPM is used to standardize the number of reads. This algorithm takes into account the impact of sequencing depth and gene length on counting. It is currently the most common method for calculating gene expression levels. one.

Figure 5: TPM calculation formula
Note: Ri represents the number of read counts of the gene; Li represents the length of the gene (kbp), and the longest transcript is selected as the gene length. The expression calculation software is StringTie [7], using default parameters, and TPM is used to display gene expression.

(6) lncRNA screening, identification and expression calculation

By assembling transcripts, new transcripts outside of the annotation files can be discovered. These new transcripts were screened under a series of stringent conditions and non-coding RNAs were identified.
(1) Use the software StringTie[7] to assemble the transcripts, obtain all the transcripts in each sample, and mark the transcripts that do not appear in the annotation file as new transcripts. The filtering conditions for new transcripts are: a) For the assembly annotation file of each sample, filter out transcripts with exon number>=2 but FPKM<=0.5 or transcripts with exon number>=1 but FPKM<=1 This[8]; b) filter the transcripts with length less than 200bp in the single transcript splicing results; c) merge the above single filtered assembled transcripts into the final assembled transcript using StringTie[7]; d) use Cuffcompare[ 9] The software analyzes the structural relationship between unknown transcripts and known transcripts.
(2) Prediction of candidate lncRNA coding ability. lncRNA is a type of non-coding RNA. By predicting the coding ability of candidate lncRNA, the real lncRNA can be further identified. Use a variety of mainstream coding ability prediction software to predict the coding ability of candidate lncRNAs. Prediction software includes CNCI [10] and CPC [11].

(3) Calculate the expression of known lncRNA based on the annotated lncRNA gene information in the genome.
(4) Expression calculation. The expression of lncRNA was normalized with TPM.

(7) circRNA identification and expression calculation

Use two mainstream software to identify circRNA [12], and finally take the union of the two software as the final prediction result. The two softwares are find_circ[13] and CIRCexplorer2[14].
(1) The prediction principle of Find_circ is shown in the figure below:

Figure 6: Principle of identification of find_circ software


The basic principle is to extract 20-nt anchor sequences from both ends of the reads that are not aligned to the reference sequence, and align each pair of anchor sequences to the reference sequence again. If the 5' end of the anchor sequence is aligned to the reference sequence, (The start and stop sites are marked as A3 and A4 respectively), and the 3' end of the anchor sequence is aligned to the upstream of this site (the start and stop sites are marked as A1 and A2 respectively), and at If there is a splicing site (GT-AG) between A2 and A3 of the reference sequence, this read is regarded as a candidate circRNA. Finally, candidate circRNAs with a read count greater than or equal to 2 are regarded as identified circRNAs. The software used for comparison is bowtie2 [15].

(2) The measurement principle of CIRCexplorer2 is shown in the figure below:

Figure 7: CIRCexplorer2 software identification principle

The software supports a variety of RNA aligners such as TopHat2/TopHat-Fusion, STAR, MapSplice, BWA and segemehl. STAR [16] software was used for comparison.

Expression calculation: Since the read count value of the circRNA anchor sequence is calculated, SRPBM [17] is used to calculate the expression of circRNA during normalization. The formula is:

Figure 8: SRPBM expression calculation formula
Note: Ri represents the number of read counts of the circRNA anchor sequence; Li represents the length of the circRNA anchor sequence (k bp).

(8) Identification of differentially expressed genes

After obtaining the gene expression levels of each sample, in order to determine which gene expression levels have changed significantly in the treatment group compared to the control group, the differences between the samples in each group were Genetic analysis. According to the read count of the genes in each group of samples, the most widely used differential gene analysis software can be used to perform differential gene analysis. The software for biological duplicate analysis is DESeq2 [18], and the software for non-biological duplicate analysis is edger.

(9) Differentially expressed gene enrichment analysis

Clusterprofiler was used to perform GO and KEGG enrichment analysis and mapping on differentially expressed mRNA, source genes of differentially expressed cicrRNA, and target genes of differentially expressed lncRNA.

(10) Overlap analysis of differential m6A-associated genes and differentially expressed genes

In order to understand the overlapping relationship between differential m6A-associated genes and differentially expressed genes, the following statistics were performed:
(1) Obtain the difference peak associated with each gene (including significant and insignificant). One gene may correspond to multiple peaks. In this case, this gene will appear in multiple peaks. times < /span>3) Use the four-quadrant diagram to display. (
(2) Identify the overlapping relationship between the genes obtained in (1) and the differentially expressed genes. ; there may be no corresponding difference peak, in which case the gene will not appear.

(11) Schematic diagram of information analysis process

Figure 9: Schematic diagram of analysis process

4. Summary: Methylated RNA co-immunoprecipitation (MeRIP-seq/m6A-seq) research ideas

MeRIP is used to study adenosine methylation modification of RNA through m6A-specific antibody enrichment and sequencing. YiGene independently develops micro-RNA methylation detection technology, and the starting amount of samples can be reduced to 10-20 μg, with a minimum of only 5 μg of total RNA.

m6A research ideas

  1. Comprehend the characteristics of the m6A methylation map as a whole: changes in the number of m6A peaks, changes in the number of m6A modified genes, analysis of the number of m6A peaks in a single gene, distribution of m6A peaks on gene elements, motif analysis of m6A peaks, and functional analysis of m6A peak modified genes;
  2. Screen specific differential m6A peaks and genes: identification of differential m6A peaks, analysis strategies for non-sequential data, analysis strategies for sequential data, functional analysis of differential m6A modified genes, PPI analysis of differential m6A modified genes, and visual display of m6A modifications of candidate genes;
  3. m6A methylome & transcriptomic association analysis: Meta genes overall association, DMG-DEG correspondence association, screening strategy for m6A modification target genes;
  4. Further validation or late-stage trials.

5. Methylated RNA co-immunoprecipitation (MeRIP-seq/m6A-seq) research project case

Title: Sevoflurane impairs m6A-mediated mRNA translation and leads to fine motor and cognitive deficits Sevoflurane anesthesia impairs m6A-mediated mRNA translation and leads to fine motor and cognitive deficits

Time: 2021

期刊:Cell Biology and Toxicology

Impact factor: IF 6.691

Technology platform: m6A-seq (MeRIP-seq), RIP-seq (RNA-binding protein immunoprecipitation), scRNA-seq

Research summary:

This study explores the mechanism of general anesthetics' damage to fine motor abilities in infants and young children, and for the first time focuses on the role and mechanism of m6A methylation in the impact of sevoflurane anesthesia on fine motor impairments.

Previous studies have found that the mechanisms of neurodevelopmental toxicity caused by general anesthetic drugs are still different between non-human primates and rodent models. For example, it has been clinically observed that multiple general anesthesia and surgeries can cause long-term speech problems in infants and young children. And the reduction of social ability and the impairment of various social behaviors after anesthesia have been observed in non-human primate models, but it is difficult to observe the impairment of social ability in young rats that have been anesthetized multiple times. The researchers believe that in the non-human primate and rodent models of cognitive impairment caused by general anesthesia drugs, there must be genes that change in the same direction, and there must be genes that change in the opposite direction. We should find out the genes that cause general anesthesia drugs in clinical practice. Key scientific issues of neurodevelopmental toxicity, and then use non-human primate macaques to explore the mechanism clues of neurodevelopmental toxicity caused by general anesthetic drugs, and then find the target genes with the same changes in macaques and mice, and use mouse models to verify .

Based on this concept, researchers have focused on using nonhuman primates and rodents to study the mechanism between general anesthesia and long-term postoperative fine motor impairment. m6A modification is the most abundant type of RNA methylation modification, and YTHDF1 is one of the recognition proteins of m6A methylation. Recent research has found that it can be involved in the formation and development of neurocognition. In subsequent mechanism studies, it was found that the expression of m6A-binding protein YTHDF1 was significantly down-regulated in the brains of juvenile primates and rodents after sevoflurane anesthesia. Single-cell transcriptome sequencing (scRNA-seq) found that the decrease in YTHDF1 expression in sp8-positive neurons was most obvious in interneurons, and these neurons will subsequently develop into VIP interneurons. The main function of YTHDF1 is to recognize methylation sites on RNA. Through RIP experiments (RNA-binding protein immunoprecipitation) and m6A-seq sequencing analysis, it was found that m6A is highly enriched on the mRNA of synaptophysin, and there is a binding site for YTHDF1 on the mRNA of Synaptophysin. Previous studies have found that synaptophysin is closely related to the neurodevelopmental toxicity of general anesthetics. Overexpression of YTHDF1 can rescue sevoflurane-induced fine motor ability and cognitive dysfunction as well as Synaptophysin changes in young mice. Studies suggest that YTHDF1 regulates the expression of its downstream target gene Synaptophysin in an m6A methylation-dependent manner, thereby damaging the fine motor ability and cognitive function of mice. This study explores the mechanism by which general anesthetics damage the fine motor abilities of infants and young children, and is expected to provide new ideas for preventing or treating the neurodevelopmental toxicity of general anesthetics.

Figure 1: Research summary

Figure 2: scRNA-seq reveals that sevoflurane reduces YTHDF1 expression in interneurons

Figure 3: MeRIP-seq reveals that YTHDF1 regulates synaptophysin in an m6A-dependent manner

The above is an introduction to the experimental procedures and analysis ideas of methylated RNA co-immunoprecipitation (MeRIP-seq/m6A-seq). Yigene Technology provides a comprehensive overall solution for RNA methylation research. For technical details, please call Yigene 0755 -28317900.

references:

 

 

 

Guess you like

Origin blog.csdn.net/qq_52813185/article/details/134896342