Master MeRIP-seq Data Analysis: Step-by-Step Process for RNA Methylation Studies

Epitranscriptomics

At A Glance

01Data pre-processing 02Data Comparison 03Assessment and treatment of results 04Preparation for subsequent analysis 05Methylation site identification 06Peak Annotation 07Peak Difference Analysis 08Visualisation 09Motif Analysis 10Interpretation of biological significance

MeRIP-seq analysis refers to the process of processing, analysing and interpreting high-throughput sequencing data acquired by MeRIP-seq methods. This process includes several key steps aimed at providing an in-depth understanding of methylation modifications in RNA and their functions in the regulation of gene expression. By integrating experimental data and bioinformatics analyses, the researchers were able to reveal the important role that RNA methylation plays in a variety of biological processes.

The following is the basic flow of MeRIP-seq analysis.

Data pre-processing

quality assessment

Quality assessment of raw sequencing data is performed using tools such as FastQC. The main areas to check are as follows:

Read length: confirm that the reads are the expected length.
Mass fraction distribution: Look at the mass fraction of each base and make sure that the majority of bases are above the threshold (typically Q20 or Q30).
Splice contamination: detect the presence of splice sequences in the read segment.
rRNA abundance: detects if rRNA is too high to avoid a reduction in the sequencing depth of the target mRNA. the presence of rRNA can confound subsequent methylation analyses and lead to inaccurate results.

Data cleaning

Cleaning of low quality data usually includes the following steps:

Remove splice sequences: Use tools such as Trimmomatic or Cutadapt to remove splice sequences and low quality reads.
Remove low quality reads: Filter out low quality reads based on mass fraction and length.
Remove rRNA: There are a variety of commercially available rRNA removal kits on the market, such as Ribo-Zero, that can effectively remove rRNA.Alternatively, rRNA can be captured and removed by specific probes.

Data Comparison

Selecting a Reference Genome

Download the reference genome: Download the reference genome sequence for the desired species from a public database (e.g. UCSC Genome Browser or Ensembl).
Prepare annotation files: Obtain the corresponding gene annotation file (e.g., GTF or GFF format) for subsequent analysis.

Selection of comparison tools

Select the suitable comparison tool according to the data characteristics, such as:

STAR: suitable for RNA-seq, capable of handling splice variants.
HISAT2: optimised for splicing processing.
Bowtie2: fast and efficient short sequence comparison.

Perform comparison

Set parameters: Set comparison parameters according to the data characteristics, such as the maximum allowed number of mismatches, whether to allow multiple comparisons, and so on.
Execute alignment: Run the selected alignment tool to align the cleaned reads to the reference genome.

Assessment and treatment of results

Evaluate the results: Use tools (e.g.Samtools) to check the match rate and the completeness of the matched files. Calculate the percentage of read segments that were successfully matched, usually expecting a match rate higher than 70%.
Coverage: assess the coverage of the target mRNA to ensure that there is sufficient read coverage.
Visual Verification: Use tools such as IGV to visualise the comparison results, confirm the signal strength and coverage of the target area, and detect the presence of anomalies.
Statistical analysis: Draw the distribution graph of read segment length, quality score and other related parameters to visualise the quality improvement.
BAM File Processing: Output the results to BAM format and sort and remove duplicates (using Samtools).

Preparation for subsequent analysis

Indexing: index the BAM files for subsequent use.
Prepare coverage files: Use tools to generate coverage files (e.g. bedGraph or bigWig format) suitable for downstream visualisation and quantitative analysis.

Methylation site identification

Enrichment analysis

Input control samples: Prepare input control samples (samples without MeRIP treatment) as background signals.
Calculation of enrichment multiplicity: Compare the signal intensity of the MeRIP sample with that of the input sample and calculate the enrichment multiplicity. The following formula is usually used:

Peak calling

using peak calling tools: Identify significant methylation-enriched regions (i.e. methylation peaks) . MACS (Model-based Analysis of ChIP-Seq): one of the most commonly used tools for enrichment experiments such as MeRIP-seq. Piranha: Designed for MeRIP-seq, it can effectively identify RNA-modified enrichment regions. HPeak: Suitable for peak detection in high-throughput sequencing data.
Parameter setting: adjust parameters according to the experimental design. p-value threshold: sets the significance level, e.g. 0.01 or 0.05. Minimum Peak Height: specify the minimum signal intensity of peaks to filter out low intensity signals. Extended size: for ChIP-seq and MeRIP-seq data, it may be necessary to set the extended size of the read fragments.

Methylation site identification

Extract peak position information: Extract relevant position information from the peak calling results and organise it into a table (e.g. chr, start position, end position, enrichment value, etc.).The generated peaks were analysed and the specific RNA sequences corresponding to these peaks were identified in conjunction with genomic annotation information.
Data integration: integration of data from different samples to compare methylation site changes under different conditions.

Peak Annotation

Obtaining genome annotation information

Download genome annotation files: obtain genome annotation files (e.g. GTF/GFF format) of related species, which can be downloaded from public databases (e.g. Ensembl, UCSC, etc.).

Intersection of peaks and genome annotations

Intersection calculation using tools: use tools such as bedtools, Homer or ChIPseeker to intersect MeRIP-seq peaks with genome annotations.
bedtools intersect: a command line tool to quickly find which peaks fall within or near known genes.
Homer annotatePeaks: you can directly annotate the peaks, and the output includes gene name, location and other information.

Annotation results collation

Collate data: the intersection results are collated into a tabular format, containing information such as the location of the peak, the corresponding gene, the type of gene (e.g. coding gene, non-coding RNA, etc.), the distance, and so on.

Functional annotation

Functional analysis of genes: Use online tools (e.g. DAVID, g:Profiler, etc.) to perform functional enrichment analysis of the annotated list of genes to understand the role of these genes in biological processes.

Validated Impacts of N6-Methyladenosine Methylated mRNAs on Apoptosis and Angiogenesis in Myocardial Infarction Based on MeRIP-Seq Analysis. Mettl3 induced upregulation of m6A methylation level in myocardial tissue, H9c2 cells, and HUVECs(Zhang Y et al.,2022)

Peak Difference Analysis

Peak Consolidation

Integrate Peaks: Integrate peak files from all samples to create a global list of peaks. This can be done with tools such as DiffBind or csaw.

variance analysis

Marginal models: using tools such as edgeR or DESeq2, peak count data can be processed and analysed for differences.
Linear models: using the limma package, differences between multiple sample groups are analysed by means of linear models.
Calculate differences: Statistical tests are performed on the expression level of each peak under different conditions by the methods described above to determine significant differences, and the p-value and FDR (false discovery rate) adjusted p-value for each peak are output.

Screening for significantly different peaks

Setting thresholds: Set significance thresholds (e.g., p-value < 0.05, FDR < 0.05) according to the research needs to screen out significant difference peaks.

Functional annotation and enrichment analysis

Gene annotation: Combine the significant differential peaks with genomic annotation information to understand which genes are affected.
Functional enrichment analysis: Use tools such as DAVID, g:Profiler, etc. to analyse the functional enrichment of genes corresponding to the differential peaks and explore their significance in biological processes.

Visualisation

Data preparation

Organise data: make sure that data such as differential peaks, gene annotation information, expression matrix, etc. have been organised to facilitate the production of graphs.

Volcano Plot

Demonstrate variability: Volcano plots are used to demonstrate the significance (p-value) and fold change of each peak. Usually the horizontal axis is log2 fold change and the vertical axis is -log10 p-value.
Tools: can be plotted using ggplot2 or EnhancedVolcano package in R.

Heatmap

Show clustering: Heatmap can show the expression pattern of differentially expressed genes, and genes or samples with similar expression patterns can be clustered together by cluster analysis.
Tools: Use pheatmap or ComplexHeatmap package in R.

Bar Plot (Bar Plot)

Compare differences between groups: bar plots can be used to show the expression levels or modification levels of specific genes under different conditions.
Tools: Again, ggplot2 or other statistical plotting tools can be used.

Scatter Plot

Correlation analysis: Scatter plots can be used to show the correlation of expression between different samples and are suitable for comparing differences between two groups.
Tools: Use ggplot2 or plotly, etc.

IGV visualisation

Genome Viewer: Use tools such as Integrative Genomics Viewer (IGV) to combine signals from MeRIP-seq with genome annotation to visually display modification distribution.
Steps: Import the processed BAM file into IGV and select the gene region of interest to view.

Visualisation of enrichment analysis results

Functional enrichment plot: Use bubble or bar graph to display the results of enrichment analysis, i.e., the biological functions or pathways of the genes corresponding to the differential peaks.
Tools: Use ggplot2 or clusterProfiler package.

Report generation

Integrate results: Integrate all visualisation results into a report, using tools such as R Markdown or Jupyter Notebook, which can generate dynamic documents.

Motif Analysis

Sequence extraction

Extract the corresponding RNA sequences from the differentially methylated sites, often using tools such as BEDTools or bedops to convert coordinates to sequences.
Ensure that an appropriate upstream and downstream sequence window (e.g. 100-200 base pairs) is selected to capture potential cis-acting elements.

Motif search

Use tools such as HOMER, MEME Suite or DREME for Motif analysis.MEME: Used to discover new Motifs.HOMER: Provides Motif analysis and visualisation.
Input the extracted sequences and run the Motif discovery algorithm.

Motif enrichment analysis

Perform enrichment analysis on identified Motifs to assess their significance in differentially methylated sites.
Use functional enrichment tools, such as GREAT, to assess the relationship of Motifs to known gene regulatory networks.

Validation of results

Cross-validation:Compare identified Motifs with Motifs in known databases (e.g. JASPAR or TRANSFAC) to confirm their biological significance.
Validate the biological functions of Motifs through literature research.

Visualisation of results

Motif presentation:Generate sequence logo plots of Motifs using tools such as WebLogo or ggseqlogo to visualise the conserved nature of Motifs.
Annotate & Analyse Results:Correlate identified Motifs with relevant biological processes or pathways to enhance the biological interpretation of the results.

Interpretation of biological significance

Discuss the function of the Motif: In a research paper or thesis, explore how these Motifs affect RNA stability, splicing, translation, or other biological processes.
Design follow-up experiments: Based on the results of the Motif analysis, design further experiments (e.g., point mutation, reporter gene experiments) to verify the function of the Motifs.

For more articles on the topic of MeRIP-seq, please refer to "Overview of MeRIP-seq," "MeRIP-seq Protocol," and "Principle and Applications of MeRIP-seq."

References:

Guo Z, Shafik AM, Jin P, Wu H."Differential RNA methylation analysis for MeRIP-seq data under general experimental design."Bioinformatics.2022,38(20):4705-4712.
Ge, Y., He, D., Zhou, Y., Xu, S., Zhao, H., & Zhang, Z ."m6A-dependent maternal mRNA clearance facilitates zygotic genome activation."Nature Communications. 2021,12:2404.

* For Research Use Only. Not for use in diagnostic procedures.