RNA sequencing (RNA-Seq) is a powerful and widely used transcriptome analysis technique that allows us to study RNA expression and structural information in biological samples to gain insights into various aspects of gene expression, splicing and isoforms. There are several RNA-Seq technologies available, each designed for different experimental purposes. The following is an overview of the major RNA-Seq technologies:
Bulk RNA-Seq, also known as large-scale RNA-Seq, detects all the RNA in a sample.The basic principle is to extract RNA from cells or tissues and then mix all the RNA from the cells together for sequencing.This method provides an average expression profile of the entire cell population, i.e., the measured gene expression level represents the average expression level of the entire cell population. This can help us to identify DEGs in different tissues, under different conditions or at different time points.
One of the most important agronomic traits in peanut is lateral branch angle (LBA), but the underlying molecular mechanisms of LBA in peanut have not been elucidated. Using the spike peanut varieties Tifrunner and prone Ipadur as materials, researchers used Bulk RNA-Seq to identify DEGs associated with the occurrence of LBA. More than 3,000 DEGs were identified by the Bulk RNA-Seq researchers, which focused on more than a dozen DEGs related to gravitropism and phytohormones, suggesting that they may have important roles in LBA formation in peanut (Ahmad N et al., 2022).
For a more in-depth understanding of Total RNA-Seq, refer to our article "Overview of Total RNA-Seq"
Single-cell RNA sequencing (scRNA-seq) technology can reveal gene expression differences between different cells and reflect cellular heterogeneity. The basic process is by converting RNA transcripts in a single cell into sequencable cDNA, followed by high-throughput sequencing, and finally obtaining the single cell's transcripts.
In scRNA-seq the first priority is to capture high-throughput single cells first, single-cell isolation technology commonly used is microtiter plate technology or titration technology, is the use of DNA tags (barcode) for single-cell identification, the core of the DNA tag is to give each cell a piece of a different barcode sequence, in the library building sequencing, the nucleic acid molecules carrying different barcode sequences are considered to be from the same cell, so that we can distinguish the transcripts of each cell from thousands of cells. Currently, the most popular scRNA-seq is 10× Genomics Chromium, which involves mixing cells with reverse transcription reagents and passing them through a microfluidic chip and then precisely dispensing them into nanometer-sized droplets, so that each droplet contains a cell and a Gel Bead with a barcode that marks the transcripts of each cell in subsequent sequencing, resulting in the formation of the transcripts. After that, the cells in each droplet are lysed to release RNA, which is bound by the capture primers contained in the Gel Bead and reverse transcribed into cDNA with the barcode and UMI information, followed by the conventional RNA-seq library construction process. scRNA-seq allows the analysis of transcriptome data from individual cells, revealing heterogeneity between cells, especially in complex tissues or cell populations in tumors, and can be used to identify rare cell types and to study cellular differentiation, development, and other processes.
Commonly used scRNA-seq data processing pipeline (Lu J et al., 2023)
scRNA-seq has been applied to various aspects of research, for example, researchers used scRNA-seq to analyze the commonalities and differences between astrocytes differentiated from mouse embryonic stem cells (mESCs) and human induced pluripotent stem cells (hiPSCs). They sampled tens of thousands of human and mouse astrocytes and nuclei from different time points, performed scRNA-seq to analyze the consistency of cell differentiation and identify possible neuroglial lineage-directed precursors, and identified genes mediating astrocyte heterogeneity of expression, exploring genes related to the neurodevelopmental fates of mice and humans in general (Frazel PW et al., 2023).
Previous studies have shown that colorectal cancer patients from different regions have different cancer histology, genetic features and molecular subtypes, etc. In order to explore the immune characteristics of different regions of colorectal cancer as well as to search for potential therapeutic targets, the investigators analyzed several tens of thousands of single-cell transcriptomic data from 19 colorectal cancer samples and normal tissues from four regions to explore the tumor environment, especially the immune microenvironment, of colorectal cancer. The role of different immune cells such as CD8+, CD4+, CD20+, and MDSC within the tumor was explored by scRNA-seq and revealed information about their potential targets for tumor therapy and prognosis. The final findings characterize that CD20+ B cell infiltration is associated with tumorigenesis and prognosis of CRC patients and has the potential to promote PD-1 antibody-mediated tumor suppression (Ji L et al., 2024).
In summary, Bulk RNA-Seq and scRNA-seq both have different focuses, one provides gene expression levels in cell populations while the other provides gene expression levels at the individual cell level, revealing cellular heterogeneity. They are often used synergistically in practical application studies. For example:Acute pancreatitis (AP) is an acute inflammatory disease, but we do not yet know its pathogenesis. To explore the immune microenvironment of AP, researchers used Bulk RNA-seq and scRNA-seq data to reveal the characteristics of immune cell infiltration in AP. Several thousand normal pancreatic cells and tens of thousands of AP pancreatic cells were measured using scRNA-seq. Analysis of the dataset identified tens of thousands of immune cells, and revealed a significant increase in the activity of various signaling pathways of the immune cells, such as apoptosis, oxidative stress, and inflammatory response, while Bulk RNA-seq revealed the characteristics of a single species of immune cells. The integration of Bulk RNA-seq data further revealed AP-related disease-specific genetic markers, and key target genes of AP such as Clic1, Sat1, Serpina3n were found. This study demonstrated that macrophages play an important role in the immune microenvironment of AP and revealed regulatory genes mediating TNF signaling between macrophages and adenohypophysial cells such as Tnfsf12-Tnfrsf12a (Fang Z et al., 2022).
In secondary spinal cord injury (SCI), the immune microenvironment of the injured spinal cord plays an important role in spinal cord regeneration. Among the immune microenvironment components, macrophages/microglia play a dual role of proinflammatory and anti-inflammatory in the subacute phase of SCI. In secondary spinal cord injury (SCI), the immune microenvironment based on various immune cells plays an important role in spinal cord regeneration, which is significant due to the dual pro-inflammatory and anti-inflammatory roles of macrophages/microglia in the subacute phase of SCI. In order to explore the immune-centered genes of macrophages/microglia, the researchers used Bulk RNA-seq and scRNA-seq to jointly screen key immune genes, and finally identified B2m, Itgb5 and Vav1 genes as immune-centered genes in spinal cord injury using various pathway analyses. The inhibitory effect of decitabine on these three immune-centered genes was verified in live organism analysis (Zhang Q et al., 2023).
Spatial transcriptome sequencing (st-seq) can help us obtain both gene expression data and spatial location information of cells, which further promotes the study of real gene expression in tissue in situ cells, and at the same time, it can make up for the shortcomings of scRNA-seq that loses the spatial location information, but st-seq cannot reach the resolution of a single cell, so it is generally used in conjunction with scRNA-seq, which can provide information about the spatial structure of tissues not obtained by scRNA-seq. So it is usually used in conjunction with scRNA-seq to provide spatial structure information of tissues not obtained by scRNA-seq.
Currently, the most widely used st-seq technology is the 10× Genomics Visium platform: using microarray technology to deposit aligned oligonucleotide sequences on slides, and then covering thinly sliced tissue slices on the array to be captured by the 10× Genomics Visium, and then permeabilizing the tissue slices to enable the release and diffusion of the cellular RNA to be captured by the oligonucleotides with a barcode for the st-seq. Tissue sections were permeabilized so that the RNA released from the cells diffused and was captured by oligonucleotides with barcodes, and in situ transcription was carried out to generate spatially-indexed cDNAs that were subjected to high-throughput sequencing to obtain information on the location of gene expression based on the address information carried on each sequenced reads.
Schematic representation of different probe hybridization and imaging strategies (Wang Y et al., 2023).
The presence and density of tertiary lymphoid structures (TLS) within tumor tissues are closely related to the treatment and prognosis of tumor patients. It is generally believed that the presence of TLS in a tumor patient indicates a better prognosis and clinical outcome of the patient's immunotherapy. In order to explore more mechanisms of TLS, researchers performed st-seq to investigate the spatial pathological structure of B-cell responses within TLS using primary tumor tissue samples from patients with primary renal cell carcinoma. Integration of st-seqnt and the maturation of B cells into plasma cells. The researchers integrated the st-seq data, Bulk RNA-seq data, and multicolor immunofluorescence data and demonstrated that the TLS is a key site of occurrence for B-cell maturation, clonal expansion, and antibody production, and that CXCL12+ fibroblasts can direct the propagation of plasma cell clones produced within the TLS to other tumor regions, and that the plasma cells within the TLS occur frequently (Meylan M et al., 2022).
Studies have shown that diabetic patients are prone to diabetic kidney disease (DKD) and that DKD is one of the leading causes of chronic kidney disease worldwide. Using st-seq and scRNA-seq, the investigators analyzed different cell populations in kidney biopsy specimens from DKD patients as well as from healthy individuals (controls), focusing on DEGs and cell-cell relationships between fibroblasts and epithelial cells. St-seq combined data showing that fibroblast subpopulations are enriched in DKD patients and that the majority of immune cells are enriched in areas of renal fibrosis, and together with single-cell transcriptome data, revealed interactions between signaling molecules as well as cellular populations in patients with DKD, providing potential therapeutic targets (Chen D et al., 2023).
Long-read RNA-Seq (LRS) technology can identify full-length transcripts, distinguish and quantify different isoforms, etc., thus providing more comprehensive and accurate transcriptome information. Currently, there are two sequencing platforms for LRS, ONT and PacBio. LRS is able to cross multiple exon-intron boundaries, which helps us to accurately splice full-length transcripts, and is especially useful in complex gene structures; in addition, LRS can clearly identify different transcript variants such as isoforms, exon jumps, and so on.
Microglia are innate immune cells of the central nervous system and have been shown to be associated with a variety of neurodegenerative diseases. Previous findings mapped the genetic regulation of microglia gene expression drinking mRNA splicing based on SRS and identified several common variant loci in microglia associated with disease risk loci. However, due to the concern that SRS could not fully identify structural variant transcript information, tens of thousands of microglia isoforms were identified using LRS to discover more than 30,000 novel isoforms and 2,000 novel genes, and characterized the association of these novel isoforms and genes with genetic risk loci for AD and PD as well as the difference between the two (Humphrey J et al., 2023).
Iso-Seq is a PacBio-based sequencing technology capable of detecting full-length transcript cDNA reads up to 15 kb in length without interrupting the RNA, allowing direct sequencing of full-length transcripts (containing 5'UTR and 3'poly tail information), which helps us to discover a large number of previously unannotated transcripts and confirm earlier gene prediction results based on cross-species homologous sequences by full-length sequencing. The process involves using reverse transcriptase to convert high-quality RNA into full-length cDNA for sequencing, PCR amplification, and construction of PacBio single-molecule, real-time (SMRT) libraries. For sequencing, transcripts from 1 to 4 kb in length can be selected for sequencing together, which ensures that long and short transcripts in this length range have an equal chance of being sequenced. After PCR end repair and PacBio SMRT connector ligation, long-read sequencing can be performed; the size preference of the sequenced fragments can be further controlled by adjusting the loading conditions of the sequencing chip.
Grape varietal differences are basically due to genetic differences between varieties, but the current transcriptome of grapevine is basically a reference to the available genome sequence PN40024, but a single reference genome can not be used as the genomic information of the whole variety. Since transcriptomes can reveal unique genetic information for specific varieties more quickly, the researchers constructed a transcriptome by sequencing full-length cDNAs from Cabernet Sauvignon berries during ripening using Iso-Seq to identify varietal specificity of homozygous types. More than 1,000 transcripts during berry development were identified by Iso-Seq, many novel genes not found in PN40024 were characterized, and several DEGs were identified characterizing unique genes related to berry development (Minio A et al., 2019).
Gossypium australe F. Mueller (2n = 2x = 26, G2 genome) has a high economic value in that it extends glandular development making cottonseed oil edible, and also has biostress resistance. However, the variable splicing events that occur in G. australe are currently unknown to us, and to explore its AS, the researchers performed full-length transcriptome sequencing of G. australe, and used Iso-Seq to identify transcripts from 10 tissues of G. australe at different developmental times and integrate them. More than 25,000 genes were reconstructed by Iso-Seq, as well as 80 pre-miRNAs, and 1,460 lncRNAs. approximately 50% of these genes exhibited two or more isoforms. Their analysis indicated the existence of five broad categories of AS events, with the highest percentage of AS events retaining introns reaching 68%. And half of the more than 25,000 genes found have at least one polyadenylation site, and even more than 7,900 genes have APA sites. The researchers also confirmed the AS events identified by Iso-Seq using RT-PCR amplification (Feng S et al., 2019).
Small RNA-Seq is mainly a high-throughput sequencing of small RNAs (including miRNAs, siRNAs, piRNAs, etc.) of a certain tissue of an animal or plant in a specific state or at a specific time, which can be used to sequence and quantify the expression of all small RNAs from multiple samples at a time with the help of the second-generation sequencing and high-throughput sequencing technology, and can obtain millions of small molecule RNA sequences at one time even if the reference sequence information of the genome of the organism is not available. For small RNA-Seq sequencing process, it should be noted that generally, small molecule RNA in tissues or cells is extracted using specialized small molecule RNA extraction kits, and after extraction, small RNA should be modified at the end, usually at the 3' end, so that small RNA can be extracted and quantified. The RNA is then modified at the end, usually by adding a poly-adenylate (poly-A) tail at the 3' end or by adding a specific junction sequence.
Type 1 diabetes (T1D) is a chronic autoimmune disease, and β-cell transplantation may be a better therapy for T1D patients, but β-cell survival is critical. Studies have shown that miRNAs are associated with T1D pathogenesis and β-cell survival, and it has been demonstrated that circulating miRNA profiles in T1D patients differ from those of healthy individuals. The researchers utilized Small RNA-Seq and RT-qPCR techniques to determine and quantify miRNAs in a mouse model of diabetes induced with streptozotocin and in test tubes of mice and human pancreatic islets exposed to varying degrees of hypoxia and cytokine stressors. ultimately, a set of miRNAs (47) were identified that could be used as a biomarker for diagnosis of β-cell stress and death as well as as a marker for early T1D symptom surveillance (Aljani B et al., 2024).
Intrauterine communication between the gestational body and the uterus mediated by extracellular vesicles in the uterine lavage fluid has an important role in porcine embryo implantation. Between studies have shown that small RNAs have a function in porcine embryo implantation, but the function of small RNAs derived from porcine UFs-EVs on this process is unknown. The researchers extracted small RNAs from cup-shaped EVs of porcine UFs at different gestational stages (D10, D13, and D18), identified a variety of known miRNAs and piRNAs and identified dozens of new miRNAs and piRNAs by Small RNA-Seq, and, combined with the results of RT-qRCR, found that ssc-let-7f-5p, ssc- let-7i-5p and ssc-let-7g were significantly differentially expressed in these three gestational stages. The results characterize that the differentially expressed miRNAs are involved in several pathways such as immunity, uterine development, etc., and play an important role in embryo implantation (Hua R et al., 2021).
Targeted RNA sequencing is the process of enriching and sequencing specific genes or transcripts by designing specific capture probes or primers for RNA-seq, so that most of the background RNA data is ignored and the sensitivity of the region of interest is improved. This approach allows for efficient and accurate gene expression analysis of a large number of samples, and is particularly suitable for studies that need to focus on certain genes or specific regions such as the detection of fusion genes that contribute to cancer development. Targeted RNA-Seq involves designing capture probes specific for particular genes after RNA extraction, and incubating the RNA sample with the probes and enriching for the target genes.
High-grade endometrial mesenchymal sarcoma (HGESS) is an endometrial mesenchymal-derived high-grade cellular proliferation with tumor characteristics of YWHAE-NUTM2A/B fusion, ZC3H7B-BCOR fusion and BCOR internal tandem duplication (ITD). Studies have reported that HGESS is associated with pan-Trk overexpression, and to characterize the specific role of pan-Trk, the investigators performed targeted RNA-Seq analysis comparing 11 HGESS with 48 other uterine sarcomas, and the data revealed specific expression of NTRK3, ESR1, and pan-Trk genes, as well as immunohistochemical expression, and mapping of different genotypes of target gene expression profiles of HGESS, and revealed that pan-Trk could be used as a tumor diagnostic marker (Momeni-Boroujeni A et al., 2020).
Jasmonic acid (JA) and salicylic acid (SA), as well as ABA, play important roles in plant defense responses such as stress tolerance, and the results of studies on ABA are abundant. In order to understand how JA and SA organize defenses under different temperature conditions, the researchers examined 1,056 samples by using DeLTa-Seq, a combination of Direct-RT and targeted RNA-Seq methods to reveal the role of both of them in Arabidopsis temperature regulation. DeLTa-Seq revealed the working modes of JA and SA at high temperature, which promote changes in gene expression in response to defense through different pathways and also synergistically regulate the expression of AT5g64040 and AT1G75690. Analysis of the results showed that target genes downstream of the SA pathway were much less tolerant to high temperature than those downstream of the JA pathway. With this study the researchers showed that Targeted RNA-seq improves RNA quality, is reproducible and can reduce costs. By extension, DeLTa-Seq can be used to study not only plants, but also animals and microorganisms (Kashima M et al., 2022).
More applications, refer to "Overview of RNA Sequencing Applications"
References: