The main goal of transcriptome profiling is to determine the number of RNA transcripts present in a sample. Although hybridization-based approaches like DNA microarrays can only provide a relative, analog measure of transcript abundance, sequencing-based methods like RNA sequencing (RNA-Seq) have the advantage of eliminating hybridization bias among genes and can provide a more accurate measure of transcript abundance and offer the promise of true digital quantification.
Figure 1. Principle of digital RNA-Seq. (Shiroguchi, 2011)
Sequence-dependent bias and amplification noise from reverse transcription, adapter ligation, library amplification by PCR, solid-phase clonal amplification, and sequencing complicate the interpretation of traditional RNA-Seq. Removing PCR and directly sequencing single molecules of RNA, or sequencing single molecules or clonal populations of cDNA, are some methods to lessen bias in RNA-Seq. Small samples or single cells, on the other hand, benefit from library amplification.
PCR is used in traditional library amplification, but the exponential amplification provided by PCR introduces noise, particularly at low copy numbers. To get around this problem, digital PCR was invented, which distributes DNA molecules into multiple containers, each receiving zero or one molecule., which are amplified and detected by PCR. This method has been used to count RNA, but it requires specific primers for each gene, which makes high-throughput measurements difficult.
Digital RNA sequencing, also known as digital RNA-seq or UMI-RNA-Seq, is an absolute quantitative transcriptome sequencing technology that includes the addition of a unique molecular identifier (UMI) to each cDNA fragment prior to library amplification. The entire whole fragment amplification, sequencing, and analysis process will be accompanied by UMI. After sequencing, UMI is used to determine the origin of each fragment and combine fragments from the same source (using the same sequence and UMI) to precisely eliminate PCR amplification duplicates and reestablish the sample's original state before amplification. Errors in PCR amplification and sequencing can also be corrected using this method.
By employing next-generation sequencing (NGS) technology, RNA sequencing (RNA-seq) is the prime tool for mapping and quantifying transcriptomes. The transcriptome is a cell's entire set of transcripts that provides information on the transcript level for a particular developmental stage or physiological condition. Different cDNA fragments are magnified unevenly due to the bias of PCR amplification. During the sequencing process, the easily amplified fragments are significantly boosted, and some low-content fragments or fragments with severe base bias are even completely lost, affecting the accuracy of sequencing results. This only helps us to explore the overall trend of gene expression; it does not allow us to quantify the original gene expression level in absolute terms.
Label every single molecule with a UMI (Unique Molecular Identifier) before library construction so that each molecule has its own sequence. Ideally, each template molecule's unique combination of the UMI and template sequences can be used to identify it. After PCR amplification, PCR copies can be identified and removed from the dataset, virtually eliminating uneven amplification and artifacts generated during the PCR. The results of quantitative statistics obtained through UMI are naturally more accurate. UMI integrates NGS with high precision sequence and quantification to study gene expression information of particular species and tissue in a specific space-time state.
This is particularly important for diagnostics and implying small amounts of starting material. Small-RNAs, ChIP-Seq tags, Aptamers, RAD-Seq tags, and GBS-tags are all examples of nucleotide populations with many similar sequences that UMI is highly recommended for.
References: