RNA Sequencing Data Facilitates Pathway Analysis


Gene expression is interpreted using the function of individual genes as well as their roles in pathways because genes are interconnected in all biological processes. Furthermore, while a small expression change in a single gene may not be substantial, minor adjustments in numerous genes may be pertinent in a pathway and have dramatic biological consequences. As a result, differentially expressed biological pathways outperform a long list of seemingly unrelated genes in terms of explanatory power.

One traditional analysis takes a gene list of interest, identified using genomics methods or curated by biologists, and tests for enrichment of each annotated gene set using statistical methods such as the Fisher Exact Test on contingency tables. Such methods can be directly implemented to the RNA-seq data-identified differentially expressed gene list. Another type of analysis ranks all expressed genes based on expression difference metrics, then performs Kolmogorov–Smirnov-like tests to determine enrichment significance. GSEA (gene set enrichment analysis) is one such highly effective method for studying functional enrichment between two biological systems.

Tools and Bioinformatics

Many studies have incorporated microarray data analysis tools and established new tools that can be used with RNA-seq data. For example, to fit RNA-seq data characteristics, a non-parametric competitive GSA approach called Gene Set Variation Analysis was developed. Microarray and RNA-Seq sample groups of lymphoblastoid cell lines that have been profiled using both technologies have produced highly correlated results. 131 SeqGSEA scores differential expression using count data modeling with negative binomial distributions, then performs gene set enrichment analysis to gain biological insights. In practical uses, SeqGSEA identifies more biologically significant gene sets with no bias toward longer or more highly expressed genes. GAGE is another pathway analysis method that can be used with both microarray and RNA-seq data. Sample sizes, experimental designs, assay platforms, and other types of heterogeneity have no effect on it. GSAASeqSP adapts and combines multiple gene-level and gene set-level statistics for RNA-seq count-based data to provide a variety of statistical procedures. Weighted KS, L2Norm, Mean, WeightedSigRatio, SigRatio, GeometricMean, TruncatedProduct, FisherMethod, MinP, and RankSum are examples of such statistics. GSAASeqSP is a powerful tool for analyzing molecular activity differences in biological pathways.

RNA Sequencing Data Facilitates Pathway AnalysisFigure 1. A tool for differential expression and pathway analysis of RNA-Seq data. (Ge, 2018)


The restrictions of gene set analysis methods established for microarrays have been thoroughly evaluated in the context of RNA-seq data. The performance of multivariate tests was investigated using a variety of commonly used RNA-seq normalization strategies. In an effort to increase other strategies beyond microarray data analysis, data transformations were also investigated. The use of log counts, normalized for sequence depth, was discovered to be a good strategy for data transformation prior to pathway analysis.

Applying methods established for microarray data analysis to RNA-seq data without taking into account specific data features could lead to biases. Moreover, despite the availability of numerous pathway databases, high-resolution annotation of such knowledge bases is still lacking. More than 90% of the human genome, for example, is alternatively spliced, and transcripts from the same gene can have different, even opposing functions. Current knowledge bases, on the other hand, are only curated at the gene level. It's also important to have knowledge of pathway-specific transcript activity. Furthermore, despite the vast number of annotations available in the public domain, high-quality gene annotations are still required.


  1. Vishnubalaji R, Sasidharan Nair V, Ouararhni K, et al. Integrated transcriptome and pathway analyses revealed multiple activated pathways in breast cancer. Frontiers in oncology. 2019:910.
  2. Ge SX, Son EW, Yao R. iDEP: an integrated web application for differential expression and pathway analysis of RNA-Seq data. BMC bioinformatics. 2018 Dec;19(1).
  3. Han Y, Gao S, Muegge K, et al. Advanced applications of RNA sequencing and challenges. Bioinformatics and biology insights. 2015 Jan;9.
* For Research Use Only. Not for use in diagnostic procedures.

Research Areas
Copyright © CD Genomics. All rights reserved.