Synthesizing Knowledge from Multi-institution EHR Data
The wide adoption of electronic health records (EHR) systems has led to the availability of large clinical datasets available for discovery research. EHR data, linked with bio-repository, is a valuable new source for deriving real-word, data-driven prediction models of disease risk and progression. Yet, they also bring analytical difficulties especially when aiming to leverage multi-institutional EHR data. Synthesizing information across healthcare systems is challenging due to heterogeneity and privacy. Statistical challenges also arise due to high dimensionality in the feature space and model mis-specifications. In this talk, I’ll discuss analytical approaches for mining EHR data with a focus on transfer learning and federated learning. These methods will be illustrated using EHR data from Mass General Brigham and Veteran Health Administration.