RNA sequencing using next-generation sequencing technologies (NGS) is now the industry standard for gene expression profiling, especially in large-scale high-throughput studies. NGS technologies include high-throughput, low-cost short-read RNA-Seq, as well as emerging single-molecule, long-read RNA-Seq technologies, which have opened new avenues for studying the transcriptome and its function. Long-read sequencing technologies are becoming more popular for transcriptome characterization due to recent improvements in throughput and cost, making them appealing for de novo transcriptome assembly, isoform expression quantification, and in-depth RNA species analysis. Due to the complicated nature of the transcriptome, which comprises variable length transcripts and numerous alternatively spliced isoforms for most genes, as well as the high sequence similarity of highly abundant species of RNA, such as rRNAs, these kinds of analyses are difficult with standard short sequencing methods.
Figure 1. Mechanism of lncRNAs/circRNAs regulating cancer biological activities. (Yang, 2018)
Long Non-coding RNA Sequencing
lncRNA sequencing (lncRNA-seq) is a powerful next-generation sequencing (NGS) tool for studying functional roles in a variety of biological processes and human diseases, including cancer and neurological disorders. LncRNA-seq has opened up new and exciting research opportunities, such as (1) profiling known and novel transcripts and defining variations, (2) predicting lncRNA targeting genes, (3) distinguishing biomarkers for cancer/disease diagnostics and classification, and (4) revealing lncRNA-mRNA regulation.
The preparation and quality assessment of samples are the first steps in the lncRNA-seq workflow. For target transcript enrichment, ribosomal RNA (rRNA) is exhausted. Reverse transcription converts the fragmented RNA to cDNA. Strand-specific libraries are created, and sequencing is carried out on the Illumina platform using a paired-end 150bp strategy.
circRNA-seq uncovers the molecular importance of regulating gene expression, including (1) profiling known and novel circRNAs and prediction of targeting genes, (2) categorization of biomarkers for cancer or disease diagnostics and classification, (3) discovery of regulatory networks between ncRNA and miRNA, and (4) understanding tissue or organism development mechanisms based on transcript profiles.
The evaluation of sample quality is the first step in the workflow. Samples are digested by RNase R to eliminate linear transcripts after ribosomal RNA (rRNA) depletion. The sequencing is done using a paired-end 150bp strategy and a strand-specific library (also known as the stranded library, directional library). In order to ensure the delivery results, each step includes a quality assessment of the samples, library, and data results. Bioinformatics pipelines can analyze data to meet a variety of requirements, resulting in high-quality, publication-ready results.
Transcriptomic studies are progressively using single-molecule long-read technologies. These tools reveal new information about the full-length sequence, alternative splicing, gene structure, and alternative polyadenylation sites. Long-read sequencing is a valuable tool for capturing the complexity of structural variation at the genomic and transcriptomic levels, and its adoption is expected to grow as costs fall. Furthermore, full-length transcript sequence data is extremely useful for genome annotation and gene function research.
References: