42 Introduction to Causal Inference

This chapter is under heavy development and may still undergo significant changes.

Causal inference is one of our favorite topics. Perhaps because it is so challenging (as we will see); yet, it is simultaneously at the very core of so much of our work as epidemiologists. Before getting into the specifics of causal inference, however, we want to briefly remind you about some of the topics we talked about earlier in the book.

When our goal is to make predictions, our analyses will focus on associations. As discussed in the [measures of association chapter][Measures of Association], association exists when the distribution of the thing we are measuring is different, on average, in two groups. Alternatively, we can say that knowing something about X tells us something (or helps us predict) about Y.

Predictions, especially good ones, can obviously be useful on their own. We may know that people of a certain race/ethnicity are most likely to get a particular form of cancer. Knowing that may allow us to concentrate screening efforts more effectively. We may know that older adults who begin to have trouble managing their finances are more likely to develop dementia. We may be able to use that information as an early indicator of important health problems to come.

However, we are often not content with predictions alone in epidemiology. It is extremely common for our questions, and subsequent studies, to either directly ask causal questions – either directly or indirectly. The reason we are often more interested in causal relationships than mere predictions can be found directly in our definition of epidemiology – we want to control health problems. Said another way, we want to know why – not just if – ”bad” things happen so that we can stop them from happening and/or why “good” things happen so that we can make them happen more often.

Notice that in the prediction examples we mentioned above (i.e., race/ethnicity predicts cancer, financial management predicts dementia) may be perfectly valid, but do they get us any closer to our ultimate goal of “controlling health problems?” We can’t change anyone’s race or ethnicity, can we? Even if we could, do race/ethnicity – social constructs – directly cause cancer? Or do race and ethnicity serve as proxies for the true unmeasured causes of cancer that disproportionately affect people of certain races and ethnicity. In other words, if we suddenly decided to start labeling people as members of a different race, while everything else about them remains the same, would we expect an impact on their cancer risk? Likewise, do we really believe that older people will no longer develop dementia if we just hire accountants to help them manage their finances better? Of course not! Again, to accomplish our goal of controlling health problems, we will often need to understand why things happen, not just if they will happen. And when we ask why questions, we are asking causal questions. On the one hand, these questions are usually incredibly exciting and have the potential to lead to real, tangible changes in population health. On the other hand, our two primary tools – data and statistics – are not sufficient to answer such questions in and of themselves.

Partially because we can’t rely on data and statistics alone to answer causal questions, generations of people were taught that answering causal questions is generally beyond the reach of science. Today, however, causal inference and causal effect estimation are extremely active areas of methodological research and there are several theories and methods we can call upon when we want to answer causal questions – as long as we are willing to make some assumptions. The theories and methods that we will consider in this part of the book include:

Counterfactuals, which are also called potential outcomes models.
Models of sufficient and component causes (SCC).
Directed Acyclic Graphs (DAGs), which are a type of graphical causal model.

In the chapters that follow, we will use R as a tool to better understand counterfactual, SCC models, and DAGs. We will also learn how to use R as a tool for creating SCC models and DAGs.