In the past decade, RNA sequencing (RNA-seq) has emerged as a transformative technology, revolutionizing our understanding of RNA-related biology. Initially employed for differential gene expression and mRNA splicing studies, RNA-seq has evolved rapidly alongside high-throughput sequencing technologies. Today, it encompasses a wide range of applications, including single-cell gene expression, RNA translation, RNA structure, spatial transcriptomics, whole transcriptome analyses, RNA-protein interactions, and more.
The diversity and complexity of transcripts challenge the conventional "one gene, one transcript" paradigm, with many genes exhibiting multiple isoforms. In short-read transcriptome sequencing, RNA molecules are fragmented and sequenced, requiring bioinformatic assembly to reconstruct the full transcript. However, the limitation of read length in short-read sequencing platforms leads to increased chimeric artifacts during assembly. Consequently, accurate retrieval of complete transcript information becomes challenging, potentially impacting downstream analyses, including expression profiling, alternative splicing, and gene fusion analyses.
Recognizing the limitations of short-read sequencing, researchers have developed long-read sequencing technologies to address the complexities of transcriptomes. Long-read cDNA sequencing, also known as single-molecule real-time sequencing (SMRT) or long-read sequencing, enables the direct sequencing of full-length cDNA molecules. This breakthrough approach offers significant advantages for studying alternative splicing, novel transcripts, and long non-coding RNAs (lncRNAs).
Short-read sequencing is well-suited for gene quantification and the study of differential gene expression. This method involves fragmenting DNA or cDNA into short segments, typically around 100-300 base pairs, and then sequencing these fragments in high-throughput. It provides a cost-effective and efficient way to measure gene expression across a large number of samples.
On the other hand, long-read cDNA sequencing is more appropriate for investigating transcript structure information, including isoforms, alternative splicing, and gene fusion. Meanwhile, direct RNA sequencing can offer insights into both transcript structure and modification information, although it requires higher-quality RNA samples.
The process of multiplexed-throughput full-length transcriptomes involves random concatenation during library construction. Utilizing CCS sequencing, multiple transcripts can be obtained from a single CCS read, maximizing the potential of the PacBio platform's long read length and significantly increasing the rate of acquiring full-length reads through Sequel sequencing. Furthermore, the incorporation of UMI technology in multiplexed-throughput full-length transcriptome sequencing allows for absolute gene quantification and efficient data utilization.
In contrast, direct RNA sequencing provides the complete sequence of poly(A), enabling the extraction of poly(A) length-related information alongside the full-length transcriptome data analysis.
Table 1. Comparison of Short Read, Long Read cDNA Sequencing, and Direct RNA Sequencing
Sequencing Technology | Short Read cDNA Sequencing | Long Read cDNA Sequencing | Direct RNA Sequencing |
---|---|---|---|
Platform | Illumina, Ion Torrent | PacBio, ONT | ONT |
Advantages | - High-throughput, high sequencing accuracy | - Long reads cover most full-length transcripts, enabling direct detection of transcripts without assembly | - Direct RNA sequencing without reverse transcription or PCR, reducing the introduction of bias |
- Wide range of available research methods and computational workflows | - Can accommodate degraded RNA | - Detection of RNA modifications | |
- Can accommodate degraded RNA | - Direct sequencing provides poly(A) tail length information | ||
Disadvantages | - Sample preparation involves reverse transcription, PCR, and fragment size selection, increasing bias | - Medium to low throughput, higher cost. Sample preparation involves reverse transcription, PCR, etc., increasing bias. Not recommended for degraded RNA | - Low throughput, higher cost |
- Limited ability to detect isoforms and accurately quantify transcripts | - Sample preparation and sequencing bias currently not well understood | - Not recommended for degraded RNA |
In response to the diverse RNA research needs, other specialized sequencing platforms have emerged, offering targeted solutions for specific biological questions. Some notable platforms include: