Causal network analysis of big genomics, epigenomic and imaging data
The current paradigm of multiple omic, imaging and phenotype data analysis for complex diseases lacks breadth (number of variables analyzed at a time) and depth (synthesis of multiple omics). Most approaches perform analyses individually and separately. The traditional analytic platforms for analyzing biological datasets use association or correlation analysis. However, systematic omic and imaging data analysis needs to uncover causal relationships among various molecular components, which form complex biological systems. How to identify the causal relationship inherent in multiscale omics and clinical phenotypes and perform integrated analysis of WGS, other omics data, and clinical data is a key issue to the analysis. Deep analyses of high dimensional and heterogeneous types of correlated omic and clinical data pose huge challenges. We develop a unified analytical framework for systematic causal decomposition through novel statistical methods of trans-omic networks integrating heterogeneous genomic, environmental, RNA-seq, DNA methylation and phenotypic data into multilayer networks underlying disease and health. The proposed method was applied to genetic and epigenetic studies of five diseases: hypertension, obesity, type 2 diabetes, Alzheimer’s Disease (AD), and Lewy Body disease to infer causal genotype-environment-expression-phenotype-disease networks. The Inferred network consists of 58,207 nodes and 192,939 edges. 960 Genes, 1501 genes, were directly connected to phenotype nodes, and disease nodes in the causal networks, respectively. 395 Genes and 2088 methylated genes were connected to 609 gene expression nodes. 14368 paths were identified from these genes to diseases. We observed several remarkable features emerge. First, 50% of edges in the gene regulatory network was consistent with the pathway structure in the KEGG pathway database. Second, several diseases may share common disease risks, pathways and genes. For examples, hypertension, obesity, and AD share gene CREBBP, and hypertension, obesity and Lewy body disease share gene KMT2C. Third, causal omic network analysis can identify the causal path from gene to disease via environment, gene expression and phenotype (risk factors) that was supported by literatures.
This is joint work with Zixin Hu, Nan Lin, Rong Jiao, Panpan Wang, Yun Zhu, Jinying Zhao, David A Bennett, and Li Jin.