The Effect: An Introduction to Research Design and CausalityNickHuntington‐KleinChapman & Hall/CRC, 2022, xiv + 620 pages, $39.95, paperback. ISBN: 9781032125787
本书面向统计学和经济学的高年级本科生及从业者,以直觉而非技术细节的方式介绍因果推断的核心概念与方法,帮助读者判断何时能认真对待因果主张,并应用适当方法回答研究问题。
Readership: Statistics and economics advanced undergraduate students, beginning graduate students as well as practitioners. In brief, causality research has been around for several decades starting with the Rubin Causal Model or the potential outcome approach (Rubin, 1974). This model has been the mainstay as a framework for analysing causal problems. More specifically, researchers would observe which treatment a given unit received and then the outcome for each unit based on that treatment. However, we do not observe the outcomes for other levels of the treatment that a unit did not receive; consequently, the researchers never directly observe the causal effects. Holland (1986) calls the latter the “fundamental problem of causal inference.” Subsequent research has delved much into the analytical details concerning how causality works. For a concise review of past research in causality, the reader should consult Athey & Imbens (2017). The book presents the analytical details from this early research and beyond and presents it in an applied way. To answer these questions, causal inference comes into play because there are often scenarios where a researcher cannot run a randomized experiment. This book introduces the readers to a topic that is high on intuition as opposed to delving into the analytical details. Put another way, this book explains to the readers how to figure out when one can take causal claims seriously and how to apply the appropriate methods to answer their research questions. There is a great interest in the use of causality, especially since the 2021 Nobel Prize for economics was won by two researchers, Guido Imbens and Joshua Angrist, who studied causality in natural experiments. Their work has led to others—including tech companies, e.g., Microsoft, Twitter, Meta, Lyft—to use these methods to guide their work in data science. Imbens & Angrist (1994) as well as this book explain why causality is so important to understanding how the world works without delving so much into the analytical details. In part 1, Huntington-Klein first discusses how to formulate an empirical research question. Then in part 2, the book covers how a researcher might describe a variable and the relationships between different variables. In part 1, these chapters delve into causal diagrams and how they could be used in the research design. Of course, Huntington-Klein avoids the presentation of the analytical details and provided the necessary intuition. This section concludes how these causal effects could be measured, e.g. the average treatment effect. The author emphasizes what makes a “good” or “bad” causal analysis which serves as the impetus of the beginning of part two of this book. The author then commences with a discussion of various tools that practitioners and researchers could use to draw causal conclusions from data. In chapter 12, a broad overview is provided of regression including ordinary least squares, robust standard errors, GLMs, and the LASSO, which is part of the penalized regressions. The topics presented in this book are like the topics presented in Angrist & Pischke (2009) and somewhat in Angrist & Pischke (2014) but with less emphasis on the analytical details. Huntington-Klein did not delve into quantile regression like Angrist & Pischke (2009). Some readers may feel somewhat overwhelmed with so many topics in any given chapter. Then, the author delves into several popular methods to measure causal effects chapters on fixed effects, difference-in-difference, matching, instrumental variables, and regression discontinuity. For each of these methods, the author provides numerous examples to explain how these methods can be used. It needs to be stressed that Huntington-Klein is not a statistician but an economist, so he presents examples from published papers on economics and public policy that also serve as references to the literature. A nice advantage of this book is that he provides code snippets in Stata, R, and Python for each of the examples in the book. The latter would certainly aid the student or researcher in carefully applying these methods to data. The final chapter in part two then briefly describes a variety of other recent methods, e.g., synthetic control and causal forests. The latter topics received much attention after Guido won the Nobel Prize in Economics in 2021. Given the conversation style and design of this book, it would make a great textbook for an undergraduate introductory data science course or social science methodology course as well as a reference for beginning graduate students. It would also benefit researchers who are working with data but are wholly clear about where to start when investigating causal relationships. To supplement the content of the text, the author posted 60+ short video lectures, which could also students learn the material. Like any text, the author provided homework assignments that emphasize content and coding. The author did not use only one software package but supplemented it with packages written in R, Python, and Stata. I perused the code presented in the text as well as used some of the code. The code works well and the actual coding seems succinct. As mentioned earlier, the author wrote this book in a conversational tone, so the author avoided analytical details but described in plain English how these calculations are done. At times I did find it difficult to follow because the plain English explanation needed some generalization to be applied to other cases in the book. For those who want fewer analytical details, this book is a good introduction to experimental design and a good book to have on your bookshelf.