Detect Transcribed Pseudogenes by RNA Sequencing

What is Pseudogene?

A pseudogene can be defined as a non-functional replica of genomic DNA that bears striking similarity to functional genes but has forfeited its capacity to carry out its original function. These entities are commonly present within eukaryotic genomes and are regarded as vestiges of gene families that emerged throughout the course of evolution. Initially perceived as inert remnants of genes that had relinquished their protein-coding capabilities, pseudogenes have recently emerged as objects of research interest due to their potential functional roles, notably as non-coding RNAs (ncRNAs). They have been implicated in a diverse array of biological processes and diseases, including cancer.

CD Genomics provides sequencing-based non-coding RNAs solution, revealing the complex networks of interaction with other DNA/RNA/proteins.

Pseudogenes vs Genes

Genes Pseudogenes
Definition Segments of DNA that encode functional proteins or RNA molecules. Gene-like sequences have lost their ability to produce functional proteins or RNA molecules.
Function Determine traits and characteristics, involved in protein synthesis. Provide genetic variation, regulatory functions, and evolutionary insights.
Protein Production Encode functional proteins or RNA molecules. Unable to produce functional proteins or RNA molecules.
Evolution Subject to natural selection, can contribute to genetic disorders. Arise from genes through duplication or retrotransposition.
Expression Actively transcribed and translated into proteins or RNA molecules. May have promoter regions, exons, and introns, but mutations prevent expression.
Biological Significance Essential for cellular functions, determine traits, affected by mutations. Can still have residual functions, regulate other genes, or produce noncoding RNAs.

Processed And Non-processed Pseudogenes

Pseudogenes can arise through different mechanisms. The two main classifications you mentioned are non-processing pseudogenes (replicative) and processing pseudogenes (retrotransposons).

Non-processing pseudogenes, also known as duplicative or non-functionalization pseudogenes, typically arise through gene duplication events. During these events, a functional gene is duplicated, resulting in two copies of the gene. Over time, mutations accumulate in one of the copies, rendering it non-functional. These mutations may include deletions, insertions, or point mutations that disrupt the coding sequence or regulatory regions of the gene. As a result, the non-functional pseudogene loses its ability to produce a functional protein and is often not transcribed or expressed.

Processing pseudogenes, also referred to as retroprocessed pseudogenes, originate from the reverse transcription of mRNA molecules and subsequent insertion into the genome. Retrotransposons, a type of mobile genetic element, play a key role in the formation of processing pseudogenes. Retrotransposons are transcribed into RNA molecules, which are then reverse transcribed back into DNA by the enzyme reverse transcriptase. This DNA copy is then inserted into a new location in the genome, potentially giving rise to a processing pseudogene. However, these pseudogenes also experience mutations and structural changes over time, leading to their loss of function.

Pseudogenes.Pseudogenes. (Cheetham et al., 2020)

The Function of Pseudogenes

Within the vast expanse of the human genome, there exists a fascinating collection of approximately 15,000 to 18,000 pseudogenes. However, it would be grossly inaccurate to dismiss these pseudogenes as mere genetic remnants or insignificant "junk" genes. In light of recent advancements in research, the study of pseudogenes has gained substantial momentum, revealing that many of these genetic entities possess meaningful biological functions, particularly those categorized as processing pseudogenes. These remarkable pseudogenes carry out their functions through a variety of intricate mechanisms.

Firstly, the transcripts of pseudogenes assume the role of antisense RNAs, effectively dampening the expression of functional genes. Through this mechanism, pseudogene transcripts act as molecular antagonists, engaging in an intricate dance of gene regulation.

Secondly, pseudogene transcripts have the remarkable capacity to generate endogenous small interfering RNAs. These minuscule RNA molecules serve as key regulators, intricately modulating the expression of functional genes. The pseudogenes, in this scenario, play an active role in orchestrating the genetic symphony within the intricate tapestry of human biology.

Several studies have shown that pseudogenes can regulate gene expression by various mechanisms, including acting as competing endogenous RNAs (ceRNAs) that sponge microRNAs, thereby modulating the expression of target genes. Pseudogenes can also serve as precursors for small interfering RNAs (siRNAs) or long non-coding RNAs (lncRNAs), which are involved in gene regulation and other cellular processes. They can function as oncogenes or tumor suppressors, impacting key cellular pathways involved in cancer biology. Furthermore, pseudogene expression patterns have been associated with patient prognosis and different cancer subtypes, suggesting their potential as biomarkers for diagnosis, prognosis, and therapeutic stratification.

Furthermore, the transcripts of pseudogenes engage in a fascinating phenomenon where they compete for binding to miRNA loci. As a striking example, let us consider the oncogene PTEN and its counterpart pseudogene, PTENP1. Remarkably, both PTEN and PTENP1 share a common miRNA binding site. In this intricate interplay, PTENP1 binds competitively to miRNA sites, ensuring the unhampered expression of the PTEN gene. This delicate equilibrium further underscores the intricate relationship between pseudogenes and functional genes.

Lastly, it is important to note that certain pseudogenes produce functional proteins, defying the conventional notion that they are mere genetic relics. This revelation challenges the previously held belief that pseudogenes lack biological significance, further highlighting the complexity of the human genome and the multifaceted nature of pseudogene functionality.

Transcriptomic Insights: Expressing Pseudogenes' Voice

RNA sequencing (RNA-seq) is indeed a valuable choice for identifying pseudogenes and studying their functions. RNA-seq is a high-throughput sequencing technique that allows researchers to obtain a comprehensive snapshot of the transcriptome, including both coding and non-coding RNA molecules, in a given sample. Here's how RNA-seq can be used to study pseudogenes.

  • Pseudogene Identification: RNA-seq data can be used to computationally identify pseudogenes based on their transcript abundance and sequence characteristics. By aligning the RNA-seq reads to the reference genome, one can identify transcripts that map to known gene loci and also detect transcripts that have similarities to known coding genes but lack protein-coding potential due to mutations or deletions. These non-coding transcripts can be potential pseudogenes. Long-read cDNA sequencing can help identify functional pseudogenes by providing full-length sequence information of RNA molecules. Unlike short-read sequencing technologies, which often generate fragmented reads, long-read sequencing platforms produce extended reads that can span the entire length of transcripts, including pseudogenes. This enables accurate characterization of pseudogene transcripts and helps distinguish them from their functional counterparts.
  • Differential Expression Analysis: RNA-seq data enables differential expression analysis, facilitating the comparison of pseudogene expression levels across distinct conditions or sample groups. By contrasting pseudogene expression in healthy and diseased tissues or different cancer subtypes, researchers can uncover differentially expressed pseudogenes associated with specific biological processes or disease phenotypes.

Pseudogenes.Pseudogene expression analysis. (Kalyana-Sundaram et al., 2012)

  • Functional Characterization: Functional characterization of potential pseudogenes is another crucial aspect of research in this field. Various computational techniques can be employed, such as analyzing the secondary structure of pseudogene transcripts, predicting their interactions with other RNA molecules, or identifying potential microRNA binding sites within pseudogene sequences. Experimental approaches, such as RNA pull-down assays or cross-linking and immunoprecipitation (CLIP), can also be utilized to identify protein or RNA molecules that interact with pseudogene transcripts.
  • Regulatory Roles: Pseudogenes can additionally exhibit regulatory roles by functioning as competing endogenous RNAs (ceRNAs), which modulate gene expression by competitively binding to microRNAs. RNA-seq data enables researchers to gain insights into ceRNA networks involving pseudogenes and their potential impact on gene expression regulation.

By integrating RNA-seq data with complementary techniques and datasets, researchers can comprehensively understand pseudogenes, encompassing their expression patterns, functional roles in normal development, and their implications in disease, including cancer.


  1. Kalyana-Sundaram, Shanker, et al. "Expressed pseudogenes in the transcriptional landscape of human cancers." Cell 149.7 (2012): 1622-1634.
  2. Cheetham, Seth W., Geoffrey J. Faulkner, and Marcel E. Dinger. "Overcoming challenges and dogmas to understand the functions of pseudogenes." Nature Reviews Genetics 21.3 (2020): 191-201.
* For Research Use Only. Not for use in diagnostic procedures.

Research Areas
Copyright © CD Genomics. All rights reserved.