Abstract
Classical algorithms aiming at identifying biological pathways significantly related to studying conditions frequently reduced pathways to gene sets, with an obvious ignorance of the constitutive non-equivalence of various genes within a defined pathway. We here designed a network-based method to determine such non-equivalence in terms of gene weights. The gene weights determined are biologically consistent and robust to network perturbations. By integrating the gene weights into the classical gene set analysis, with a subsequent correction for the “over-counting” bias associated with multi-subunit proteins, we have developed a novel gene-weighed pathway analysis approach, as implemented in an R package called “Gene Associaqtion Network-based Pathway Analysis” (GANPA). Through analysis of several microarray datasets, including the p53 dataset, asthma dataset and three breast cancer datasets, we demonstrated that our approach is biologically reliable and reproducible, and therefore helpful for microarray data interpretation and hypothesis generation.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Wu MC, Lin XH . Prior biological knowledge-based approaches for the analysis of genome-wide expression profiles using gene sets and pathways. Stat Meth Med Res 2009; 18:577–593.
Wang K, Li MY, Bucan M . Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 2007; 81:1278–1283.
Perroud B, Lee J, Valkova N, et al. Pathway analysis of kidney cancer using proteomics and metabolic profiling. Mol Cancer 2006; 5:64.
Medina I, Carbonell J, Pulido L, et al. Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling. Nucl Acids Res 2010; 38:W210–W213.
Kanehisa M, Araki M, Goto S, et al. KEGG for linking genomes to life and the environment. Nucl Acids Res 2008; 36:D480–D484.
Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M . KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucl Acids Res 2010; 38:D355–D360.
Pico AR, Kelder T, van Iersel MP, Hanspers K, Conklin BR, Evelo C . WikiPathways: pathway editing for the people. PLoS Biol 2008; 6:e184.
Thomas PD, Campbell MJ, Kejariwal A, et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res 2003; 13:2129–2141.
Mi H, Guo N, Kejariwal A, Thomas PD . PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways. Nucl Acids Res 2006; 35:D247–D252.
Vastrik I, D'Eustachio P, Schmidt E, et al. Reactome: a knowledge base of biologic pathways and processes. Genome Biol 2007; 8:R39.
Subramanian A, Tamayo P . Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005; 102:15545–15550.
Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA . Global functional profiling of gene expression. Genomics 2003; 81:98–104.
Goeman JJ, Buhlmann P . Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 2007; 23:980–987.
Kim SY, Volsky DJ . PAGE: parametric analysis of gene set enrichment. BMC Bioinformatics 2005; 6:144.
Efron B, Tibshiran R . On testing the significance of sets of genes. Ann Appl Stat 2007; 1:107–129.
Dinu I, Potter JD, Mueller T, et al. Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics 2007; 8:242.
Luo WJ, Friedman MS, Shedden K, Hankenson KD, Woolf PJ . GAGE: generally applicable gene set enrichment for pathway analysis. BMC Bioinformatics 2009; 10:161.
Smyth GK . Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004; 3:Article3.
Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ . Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci USA 2005; 102:13544–13549.
Goeman JJ, Oosting J, Cleton-Jansen AM, Anninga JK, van Houwelingen HC . Testing association of a pathway with survival using gene expression data. Bioinformatics 2005; 21:1950–1957.
Goeman JJ, van de Geer SA, de Kort F, van Houwelingen HC . A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 2004; 20:93–99.
Mansmann U, Meister R . Testing differential gene expression in functional groups – Goeman's global test versus an ANCOVA approach. Meth Inform Med 2005; 44:449–453.
Tomfohr J, Lu J, Kepler TB . Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 2005; 6:225.
Ma S, Kosorok MR . Identification of differential gene pathways with principal component analysis. Bioinformatics 2009; 25:882–889.
Yan XT, Sun FZ . Testing gene set enrichment for subset of genes: Sub-GSE. BMC Bioinformatics 2008; 9:362.
Hawkins T, Chitale M, Kihara D . Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP. Bmc Bioinformatics 2010; 11:265.
George AJ, Thomas WG, Hannan RD . The renin–angiotensin system and cancer: old dog, new tricks. Nat Rev Cancer 2010; 10:745–759.
Normanno N, De Luca A, Bianco C, et al. Epidermal growth factor receptor (EGFR) signaling in cancer. Gene 2006; 366:2–16.
Carafoli E . Calcium signaling: a tale for all seasons. Proc Natl Acad Sci USA 2002; 99:1115–1122.
Ihle JN . Cytokine receptor signalling. Nature 1995; 377:591–594.
Alexeyenko A, Sonnhammer ELL . Global networks of functional coupling in eukaryotes from comprehensive data integration. Genome Res 2009; 19:1107–1116.
Rual J-F, Venkatesan K, Hao T, et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature 2005; 437:1173–1178.
Shahbaba B, Tibshirani R, Shachaf CM, Plevritis SK . Bayesian gene set analysis for identifying significant biological pathways. J R Stat Soc C-Appl 2011; 60:541–557.
Damian D, Gorfine M . Statistical concerns about the GSEA procedure. Nat Genet 2004; 36:663–663.
Kicic A, Hallstrand TS, Sutanto EN, et al. Decreased fibronectin production significantly contributes to dysregulated repair of asthmatic epithelium. Am J Resp Crit Care Med 2010; 181:889–898.
Hoshino M, Takahashi M, Aoike N . Expression of vascular endothelial growth factor, basic fibroblast growth factor, and angiogenin immunoreactivity in asthmatic airways and its relationship to angiogenesis. J Allergy Clin Immunol 2001; 107:295–301.
Hoshino M, Nakamura Y, Hamid QA . Gene expression of vascular endothelial growth factor and its receptors and angiogenesis in bronchial asthma. J Allergy Clin Immunol 2001; 107:1034–1038.
Asai K, Kanazawa H, Kamoi H, Shiraishi S, Hirata K, Yoshikawa J . Increased levels of vascular endothelial growth factor in induced sputum in asthmatic patients. Clin Exp Allergy 2003; 33:595–599.
Chetta A, Zanini A, Foresi A, et al. Vascular endothelial growth factor up-regulation and bronchial wall remodelling in asthma. Clin Exp Allergy 2005; 35:1437–1442.
Dibb NJ, Dilworth SM, Mol CD . Switching on kinases: oncogenic activation of BRAF and the PDGFR family. Nat Rev Cancer 2004; 4:718–727.
Jechlinger M, Sommer A, Moriggl R, et al. Autocrine PDGFR signaling promotes mammary cancer metastasis. J Clin Invest 2006; 116:1561.
Collins K, Jacks T, Pavletich NP . The cell cycle and cancer. Proc Natl Acad Sci USA 1997; 94:2776–2778.
Hartwell L, Kastan M . Cell cycle control and cancer. Science 1994; 266:1821–1828.
Griffin JL, Shockcor JP . Metabolic profiles of cancer cells. Nat Rev Cancer 2004; 4:551–561.
Virmani AK, Rathi A . Sathyanarayana UG, et al. Aberrant methylation of the adenomatous polyposis coli (APC) gene promoter 1A in breast and lung carcinomas. Clin Cancer Res 2001; 7:1998–2004.
Jönsson M . Borg Å, Nilbert M, Andersson T . Involvement of adenomatous polyposis coli (APC)/beta-catenin signalling in human breast cancer. Eur J Cancer 2000; 36:242–248.
Esteller M, Sparks A . Toyota M, et al. Analysis of Adenomatous Polyposis Coli promoter hypermethylation in human cancer. Cancer Res 2000; 60:4366–4371.
Draghici S, Khatri P, Tarca AL, et al. A systems biology approach for pathway level analysis. Genome Res 2007; 17:1537–1545.
Tarca AL, Draghici S, Khatri P, et al. A novel signaling pathway impact analysis. Bioinformatics 2009; 25:75–82.
Thomas R, Gohlke JM, Stopper GF, Parham FM, Portier CJ . Choosing the right path: enhancement of biologically relevant sets of genes or proteins using pathway structure. Genome Biol 2009; 10:R44.
Hung JH, Whitfield TW, Yang TH, Hu Z, Weng Z, Delisi C . Identification of functional modules that correlate with phenotypic difference: the influence of network topology. Genome Biol 2010; 11:R23.
Dai MH, Wang PL, Boyd AD, et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucl Acids Res 2005; 33:e175.
Irizarry RA, Hobbs B, Collin F, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003; 4:249–264.
Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP . Summaries of affymetrix GeneChip probe level data. Nucl Acids Res 2003; 31:e15.
Bolstad BM, Irizarry RA, Astrand M, Speed TP . A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003; 19:185–193.
Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003; 13:2498–2504.
Cline MS, Smoot M, Cerami E, et al. Integration of biological networks and gene expression data using Cytoscape. Nat Protocols 2007; 2:2366–2382.
Benjamini Y, Hochberg Y . Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Meth 1995; 57:289–300.
Acknowledgements
We thank Lei Bao (University of California, San Diego, USA), Yun Li (SIBS, China), Tao Huang (SIBS, China), Micheal Wu (Harvard University, USA), Brad Efron (Stanford University, USA), Fei He (Shanghai Center for Bioinformatics Technology, China), Yan Feng, Zuoyun Wang, Wenjing Zhang, Jingqi Chen and Junrui Li (SIBS, China) for helpful communications or suggestions. We thank Yijun Gao and Chao Zheng for their help in manuscript preparation. This work was supported by the National Basic Research Program of China (2012CB910800, 2010CB912102), the National Natural Science Foundation of China (30871284, 30971461, 30971643 and 31071113), the Chinese Academy of Sciences (KSCX1-YW-22), and the Science and Technology Commission of Shanghai Municipality (09JC1416300, 09PJ1401000). HJ is a scholar of the Hundred Talents Program of the Chinese Academy of Sciences.
Author information
Authors and Affiliations
Corresponding authors
Additional information
(Supplementary information is linked to the online version of the paper on the Cell Research website.)
Supplementary information
Supplementary information, Data S1
Gene functional association network-based pathway gene weighting (PDF 659 kb)
Supplementary information, Data S2
A list of GEO datasets used for gene co-expressions. (PDF 129 kb)
Supplementary information, Data S3
Gene weights for 833 pathways without correction for multi-subunit genes. (PDF 934 kb)
Supplementary information, Data S4
A list of multi-subunit proteins. (PDF 116 kb)
Supplementary information, Data S5
Gene weights for 833 pathways after correction for multi-subunit genes. (PDF 934 kb)
Supplementary information, Table S1
P53 weights in different pathways (PDF 129 kb)
Supplementary information, Table S2
EGFR weights in different pathways (PDF 67 kb)
Supplementary information, Table S3
Pathways significant in P53 dataset by MeanAbs and W-MeanAbs (PDF 67 kb)
Supplementary information, Table S4
Pathways significant in P53 dataset by GSEA and W-GSEA (PDF 126 kb)
Supplementary information, Figure S1
Conserved pathways across breast cancer datasets. (PDF 338 kb)
Rights and permissions
About this article
Cite this article
Fang, Z., Tian, W. & Ji, H. A network-based gene-weighting approach for pathway analysis. Cell Res 22, 565–580 (2012). https://doi.org/10.1038/cr.2011.149
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/cr.2011.149
Keywords
This article is cited by
-
Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap
Nature Protocols (2019)
-
Integration of multiple networks and pathways identifies cancer driver genes in pan-cancer analysis
BMC Genomics (2018)
-
The microRNA-182-PDK4 axis regulates lung tumorigenesis by modulating pyruvate dehydrogenase and lipogenesis
Oncogene (2017)
-
A network-based pathway-expanding approach for pathway analysis
BMC Bioinformatics (2016)
-
Subpathway-CorSP: Identification of metabolic subpathways via integrating expression correlations and topological features between metabolites and genes of interest within pathways
Scientific Reports (2016)