Abstracts
Arlene Ash, University of Massachusetts
Medical School
Seeking needles of health wisdom in haystacks of big data
My purpose will be to imagine a world in which we have overcome the many political
and technical obstacles that my co-panelists will be discussing, and to try
to envision the kinds of data views and interactions that could enable individuals,
clinicians, health systems, communities and governments to participate in
a "learning laboratory" where we simultaneously draw on, and contribute
to, an evolving body of evidence about how to achieve and maintain human health.
I will consider points of comparison and contrast with the extraordinary success
of Google, Amazon and others, in mining huge, unstructured streams of data
to extract valuable business intelligence.
Peter Austin, Institute for Clinical Evaluative
Sciences
Propensity score methods for estimating treatment effects using observational
data
Participants will be introduced to the concept of the propensity score, which
allows for estimation of the effect of treatment when using observational
data. Four different methods of using the propensity score will be discussed:
matching, weighting, stratification, and covariate adjustment. The strengths
and limitations of the four approaches will be highlighted and their use will
be illustrated using a case study. We will contrast the use of propensity
score-based methods with conventional regression-based approaches.
Dan Chateau, University of Manitoba
Implementing a research program using data from multiple jurisdictions: The
Canadian Network for Observational Drug Effect Studies (CNODES)
Prescription drugs constitute the most common form of treatment used in clinical
practice. Despite the pre-approval randomized controlled trials, many questions
remain unanswered about their effects, whether unintended, harmful or beneficial.
In the absence of randomization, the epidemiologic approach lends itself to
the study of the real-life effects of these drugs in the natural setting of
clinical practice, often quite different from the experimental setting in
which they were developed. Individual initiatives in single jurisdictions
cannot always address some new challenges that require larger databases for
the study of rare and serious adverse events, for the study of drugs used
for infrequent diseases, and to study medications early after they enter the
market. To address these challenges, the Canadian Network for Observational
Drug Effect Studies (CNODES) was created, with funding from the Drug Safety
and Effectiveness Network (DSEN). A pan-Canadian collaboration assembling
over 60 scientists from across the country, we use existing healthcare databases
on over 30 million people along with powerful analytical methods, to rapidly
evaluate the risks and benefits of medications. The CNODES networks
structure has given rise to several methodological issues and challenges in
estimating effects of medications from such a distributed network. The presentation
will provide an overview of the CNODES network and a typical study, present
results of several CNODES studies, describe methodological problems relevant
to the CNODES network, and ongoing methodological research and potential solutions.
Constantine Gatsonis, Brown University
Diagnostic test assessment using large databases
Modalities for diagnosis and prediction are used widely and account for
a significant fraction of health care expenditures. The study of the impact
of these modalities on subsequent process and outcomes is increasingly turning
to the use of registries and large administrative and clinical databases.
In this presentation we will discuss the experience from a large national
registry, the National Oncology PET Registry, and current research combining
information from registries and large administrative data bases. In keeping
with the "big data" theme of the workshop, we will also discuss
current research on developing large "in silico" trials to assess
the performance of imaging modalities. These trials utilize simulated imaging
scans representing populations of interest, and modalities for image display
and interpretation.
David Henry, Dalla Lana School of
Public Health, University of Toronto, and Institute for Clinical Evaluative
Sciences
Being realistic about how Big Data can support health and health policy research
Much rhetoric has been directed at the transformative benefits
of Big Data on health care delivery evaluation and planning. Claims have ranged
from complete genome sequencing- guided ultra-precise medical treatments to
real time evaluation and management of health system performance. In between
these are forecasts that we will see the demise of the randomised clinical
trial replaced by bias-free analyses of treatment effects in large
population data-bases. Unquestionably, access to high throughput multi-omics
analyses, linked population health data-sets, and a range of novel data sources
is providing new and important insights, but the effects are incremental,
not transformative. New medical interventions may be guided by genomic analyses
but they must also be subjected to the tenets of evidence based medicine.
Propensity matching, instrumental variable analyses and a range of other analytical
approaches cannot be relied on to provide completely unbiased estimates of
efficacy. On the other hand access to multiply linked biological and population
data sets augmented by data from personal monitoring devices opens up new
opportunities to define and link the phenome and genome for research on a
scale not previously contemplated. This creates challenges for methodologists
who have tended to train and work in silos and cannot easily span the full
range of analytical techniques that are used. What are the roles of the data
scientist of the future and how will we build the necessary capacity in training
programs at research universities? The speaker does not have answers to these
questions but would like to provoke a discussion of the underlying issues.
Patrick J. Heagerty, University
of Washington
Pragmatic Trials and the Learning Health Care System
Health care delivery, quality improvement, and clinical research should operate
in a coordinated fashion so that patient care and patient outcomes can improve.
Pragmatic trials are an important class of research designs that are conducted
within health care delivery systems. The context of the delivery system is
associated with unique issues regarding appropriate study design, data collection
and quality control, and statstical analysis. Statistical leadership is needed
so that high quality definitive studies are conducted. In this presentation
we review select initiatives within the US, and overview key challenges associated
with the multilevel structure of the delivery system. Methods are needed to
provide large-scale monitoring of data quality, to design and analyze longitudinal
studies, and to provide impactful information to patients and providers. We
will use our recent experience as a demonstration project within the NIH Collaboratory
to illustrate key issues.
Xiachun Li, Indiana University
EMR² : Evidence Mining Research in Electronic Medical Records Towards
Better Patient Care
The Indianapolis Network for Patient Care (INPC) was created in 1995 with
the goal of providing clinical information at the point of patient care. It
houses clinical data from over 80 hospitals, public health departments, local
laboratories and imaging centers, surgical centers, and a few large-group
practices closely tied to hospital systems, for approximately 13.4 million
unique patients. This wealth of data provides great opportunities for comparative
effectiveness and pharmaco-epidemiology research leading to knowledge discovery.
In this talk, I will present three representative projects with the ultimate
goal of better patient care:
-Record Linkage
This is the requite step before better patient care and research. Health information
exchanges (HIEs) are increasingly distributed across many sources as our nation
moves into an era of electronic health record systems. But HIE data are often
from independent databases without a common patient identifier, the lack of
which impedes data aggregation, causes waste (e.g., tests repeated unnecessarily),
affects patient care and hinders research.
-Benchmark an Electronic Medical Record database
In typical database studies to investigate drug-outcome associations, risk
measures are calculated after adjusting for an extended list of possible confounders,
and then the strength of drug-outcome association is obtained by comparing
the risk estimates against a theoretical null distribution. It has be recognized
that electronic health records (EHR) databases are created for routine clinical
care and administrative purposes but not for research, and thus they may have
more hidden biases. If a list of medications is known not to cause an outcome
under study, their association with the outcome (or lack of to be more precise)
can be used to estimate a null distribution. Then this estimated null distribution
will be used to calibrate the strength of the risk estimate.
-Predictive modeling for clinical decision support (CDS)
This research is towards the ultimate goal of the meaningful use of EMR to
achieve better clinical outcomes, improved population health outcomes and
increased transparency and efficiency.
I will discuss the statistical approaches, results and the challenges encountered
Erica Moodie, McGill University
Marginal Structural Models
In this workshop, we will consider the definition of a marginal structural
model and the assumptions required for its identification. Three approaches
to estimation will be presented: inverse probability weighting, g-computation,
and g-estimation. The workshop will consist of lecturing, demonstrations,
and in-class exercises. A laptop computer will be required for the exercises;
participants may wish to complete the exercises individually or in groups
of 2-3 people.
Jonas Peters, Max Planck Institute for Intelligent
Systems
Three Ideas for Causal Inference
In causal discovery, we are trying to infer the causal structure of the underlying
data generating process from some observational data. We review three different
ideas that aim at solving this problem: (i) additive noise models, which assume
that the involved functions are of a particularly simple form, (ii) constraint-based
methods which relate conditional independences in the distribution with a
graphical criterion called d-separation and (iii) invariant prediction, which
makes use of observing the data generating process in different "environments".
We discuss these methods in the context of a gene expression data set (yeast),
in which observational and interventional data are available. This talk is
meant as a short tutorial. It concentrates on ideas and concepts and does
not require any prior knowledge about causality.
Mark Smith, Manitoba Centre for Health Policy
Recent challenges (and opportunities) of working with Big Data
in health research.
Adding new data to a large existing population-based data repository can
present many challenges (in addition to new opportunities). I will explore
both sides of this equation using four big data case studies:
in-hospital pharmaceutical, justice, laboratory and EMR data. Challenges include
commitment and trust building with new organizations, size and complexity
of data, adequacy of documentation and the ongoing human resource challenges
for both uptake and provision of data. Opportunities include filling important
gaps in existing knowledge and new kinds of research questions, including
moving beyond questions of health care to the social determinants of health.
Mark Smith, Manitoba Centre for Health Policy,
Mahmoud Azimaee, Institute for Clinical Evaluative Sciences
Emerging data quality assessments of administrative data for use in research
Broadly defined, data quality means "fitness for use". Administrative
data were gathered for a particular purpose - running a program - and can
have qualities that are well-suited for that purpose. When the data are adapted
to a new purpose, such as research, issues of data quality become especially
salient. In broad terms we will outline a data quality framework applicable
to the use of administrative data in research and explore its implementation
at MCHP (Manitoba) and ICES (Ontario). Topics to be discussed, among others,
include completeness, correctness, internal and external validity, stability,
link-ability, interpretability and automation.
Elizabeth Stuart, Johns Hopkins University
Using big data to estimate population treatment effects
With increasing attention being paid to the relevance of studies for real-world
practice (such as especially in comparative effectiveness research), there
is also growing interest in external validity and assessing whether the results
seen in randomized trials would hold in target populations. While randomized
trials yield unbiased estimates of the effects of interventions in the sample
of individuals (or physician practices or hospitals) in the trial, they do
not necessarily inform about what the effects would be in some other, potentially
somewhat different, population. While there has been increasing discussion
of this limitation of traditional trials, relatively little statistical work
has been done developing methods to assess or enhance the external validity
of randomized trial results. In addition, new big data resources
offer the opportunity to utilize data on broad target populations. This talk
will discuss design and analysis methods for assessing and increasing external
validity, as well as general issues that need to be considered when thinking
about external validity. The primary analysis approach discussed will be a
reweighting approach that equates the sample and target population on a set
of observed characteristics. Underlying assumptions, performance in simulations,
and limitations will be discussed. Implications for how future studies should
be designed in order to enhance the ability to assess generalizability will
also be discussed, including particular considerations in big data."
Michael Wolfson, University of Ottawa
Pretty Big Data and Analysis to Understand the Relative Importance of Health
Determinants - HealthPaths
While "health" is often associated with hospitals and other
high tech interventions, there is a long tradition in social epidemiology
of trying to understand the broader determinants of health - from proximal
risk factors like obesity and smoking, to distal, especially socio-economic
status - sometimes called "the causes of the causes". However, disentangling,
let alone quantifying, the various strands in this web of causality is a major
challenge. In this presentation, we describe an intensive analysis of Canada's
premier longitudinal data set for these questions, Statistics Canada's National
Population Health Survey (NPHS). A unique aspect of the statistical analysis
is that it has been tightly coupled with the development of a dynamic longitudinal
microsimulation model, HealthPaths. The statistical analysis of the NPHS uses
elastic net and cross validation methods, and generates millions of coefficients.
While these coefficients for the estimated dynamics of risk factors and health
status cannot be understood by inspection, their implications can be drawn
out and understood using microsimulation. One surprising result is the relatively
low importance of obesity, and the relatively high importance of psychological
factors like pain, in the overall burden of ill health among Canadians.
Hautieng Wu, University of Toronto
Online and adaptive analysis of dynamic periodicity and trend with heteroscedastic
and dependent errors -- with clinical applications
Periodicity and trend are features describing an observed sequence, and extracting
these features is an important issue in many scientific fields, for example,
the epidemiology. However, it is not an easy task for existing methods to
analyze simultaneously the dynamic periodicity and trend, and the adaptivity
of the analysis to such dynamics and robustness to heteroscedastic, dependent
errors is in general not guaranteed. These tasks become even more challenging
when there exist multiple periodic components.
We propose the "adaptive harmonic model" to integrate these features,
and propose a time-frequency analysis technique called ``synchrosqueezing
transform'' (SST) to analyze the model in the presence of a trend and heteroscedastic,
dependent errors. The adaptivity and robustness properties of the SST and
relevant issues are theoretically justified; the real time analysis and implementation
are accomplished. Consequently we have a new technique for de-coupling the
trend, periodicity and heteroscedastic, dependent error process in a general
setup. In this talk, we will show its application to seasonality analysis
of varicella and herpes zoster. The data are obtained from Taiwan national
health insurance research database. Several dynamical behaviors extracted
by SST will be reported.
Cory Zigler, Harvard University
Uncertainty and treatment effect heterogeneity in comparative effectiveness
research
Comparative effectiveness research depends heavily on the analysis of a rapidly
expanding universe of observational data made possible by the integration
of health care delivery, the availability of electronic medical records, and
the development of clinical registries. Despite extraordinary opportunities
for research aimed at improving value in health care, a critical barrier to
progress relates to the lack of sound statistical methods that can address
the multiple facets of estimating treatment effects in large, process-of-care
data bases with little a priori knowledge about confounding and treatment
effect heterogeneity. When attempting to make causal inferences with such
large observational data, researchers are frequently confronted with decisions
regarding which of a high-dimensional covariate set are necessary to properly
adjust for confounding or define subgroups experiencing heterogeneous treatment
effects. To address these barriers, we discuss methods for estimating treatment
effects that account for uncertainty in: 1) which of a high-dimensional set
of observed covariates are confounders required to estimate causal effects;
2) which (if any) subgroups of the study population experience treatment effects
that are heterogeneous with respect to observed factors. We outline two methods
rooted in the tenets of Bayesian model averaging. The first prioritizes relevant
variables to include in a propensity score model for confounding adjustment
while acknowledging uncertainty in the propensity score specification. The
second characterizes heterogeneous treatment effects by estimating subgroup-specific
causal effects while accounting for uncertainty in the subgroup identification.
Causal effects are averaged across multiple model specifications according
to empirical support for confounding adjustment and existence of heterogeneous
effects. We illustrate with a comparative effectiveness investigation of treatment
strategies for brain tumor patients.
Back
to top