Seminar Series
November 5, 1999
Topic: Causal Discovery from Non-Experimental Data
Speaker: David Heckerman, Microsoft Research
ABSTRACT:
Statisticians, in large part, make observations and use these observations
to make predictions. For example, based on a statistical study, one
can conclude that, if you smoke, then it is more likely that you will
get lung cancer than if you don't smoke. Unfortunatley, this sort of
information is not all that useful to--say--health care professionals.
What they want to know is, if you CHANGE your behavior and start smoking,
will you increase your chances of getting lung cancer? It turns out
that the notion of cause and effect lies at the heart of such questions.
The tricky thing about cause is that it is not correlation. Statisticians
have been saying this for over a hundred years. So how do we discover
causal relationships? One method that has been used for almost a century
is the randomized trial. If we want to figure out whether or not smoking
causes lung cancer, we take--say--one hundred people, make half of them
smoke, the other half not smoke, and see how many in each group get
lung cancer. Of course, we can't really do this because it's unethical.
But doctors, patients, and politicians are beginning to realize that
randomized trials to test drugs and new surgical procedures are just
about as unethical. After all, if you go to all the expense of testing
a new drug, you probably think it is better than whatever is available
already. Why should we take half the patients that need that drug, and
prevent them from taking it? In my talk, I will discuss statistically
oriented methods for discovering cause and effect without the need for
randomized trials. These approaches are based on graphical models called
Bayes nets or DAGs. I will illustrate the methods on several real examples.