Back to the future: 'simple' regression models for complex genetic association studies
Linear regression remains an important framework in the era of big and complex data. In this talk I present some recent examples where we resort to the classical simple linear regression model and its celebrated extensions in novel settings. The Eureka moment came while reading Wu and Guan's (2015) comments on our generalized Kruskal-Wallis test (Elif Acar and Sun 2013, Biometrics). Wu and Guan presented an alternative "rank linear regression model and derived the proposed GKW statistic as a score test statistic". Indeed, the regression framework eases the derivation and facilitates further extensions. More recently, we turned our attention to extending Levene's variance test to data with genotype uncertainty and related individuals; this test is useful for GxE interaction studies but data on E is not available. While a direct modification of the original test statistic is challenging, I will demonstrate that establishing a two-stage regression framework for the original Levene's test makes the ensuing method development quite straightforward, eventually leading to a generalized joint location-scale test (David Soave and Sun 2017, Biometrics). Finally, I will discuss some on-going projects, including how to robustify the allelic association test against Hardy-Weinberg disequilibrium and generalize it for quantitative traits, how to develop a flexible association test for the complex X-chromosome, and how to unify, in an analytical sense, methods developed for rare variants with the polygenic risk score analyses, among others. In each case, the crux of the work is reformulating the problem as a regression!