Integrative genetical genomics analysis incorporating network structures
Genetical genomics data provide promising opportunities for integrative analysis of gene expression and genotype data. Lin et al. (2015) recently proposed an instrumental variables (IV) regression framework to select important genes with high dimensional genetical genomics data. The IV regression solves the endogeneity problem due to correlation between gene expressions and the error term, hence improves the performance of gene selection. As genes function in networks to fulfill their joint task, incorporating network or graph structures in a regression model can further improve gene selection performance. In this work, we propose a graph constrained penalized IV regression framework to solve the endogeneity issue and to improve the selection performance via incorporating a gene network structure. We propose a two-step estimation procedure by adopting a network constrained regularization method to obtain better variable selection and estimation, and further establish the selection consistency. Simulation and real data analysis are conduced to show the utility of the method.