论文部分内容阅读
Background: Pathway analysis can assist researchers to understand biological aberrations at a systems level.Thus, finding important pathways that underlie biological problems helps researchers to focus on the most relevant sets of genes.Pathways resemble networks with complicated structures, but most of the existing pathway enrichment tools ignore topological information embedded within pathways, which limits their applicability.Methods: In the current analysis, we focus mainly on the centrality-based extension of the broadly applied Over-Representation Analysis method (ORA).The pathway score is defined as the summation of the weights of differentially affected nodes in the pathway, and the non-parametric null distribution of pathway score is generated through simulation from background genes.In the model, we weight the nodes by network centralities.Despite the available centralities in network analysis, we defined one additional centrality, largest reach, to measure whether nodes stay in the upstream or downstream part of the pathway.Results: A systematic and extensible pathway enrichment method in which nodes are weighted by network centrality was proposed.We demonstrate how choice of pathway structure and centrality measurement, as well as the presence of key genes, affects pathway significance.We emphasize two improvements of our method over current nethods.First, allowing for the diversity of genes characters and the difficulty of covering gene importance from all aspects, we set centrality as an optional parameter in the model.Second, nodes rather than genes form the basic unit of pathways, such that one node can be composed of several genes and one gene may reside in different nodes.By comparing our methodology to the original enrichment method using both simulation data and real-world data, we demonstrate the efficacy of our method in finding new pathways from biological perspective.Conclusions: Our method can benefit the systematic analysis of biological pathways and help to extract more meaningful information from gene expression data.The algorithm has been implemented as an R package CePa that is available at CRAN, and also a web-based version of CePa is provided .