线性回归用于因果推断的隐含权重

On the implied weights of linear regression for causal inference

Biometrika · 2022

被引 31

ABS 4

Ambarish Chattopadhyay
José R. Zubizarreta 通讯

中文导读

本文通过分析线性回归方法对个体数据的隐含权重，揭示了回归调整如何模拟随机实验的协变量平衡、自加权抽样和代表性等特征，并提出了用于因果推断的回归诊断工具。

Abstract

Summary A basic principle in the design of observational studies is to approximate the randomized experiment that would have been conducted under ideal circumstances. At present, linear regression models are commonly used to analyse observational data and estimate causal effects. How do linear regression adjustments in observational studies emulate key features of randomized experiments, such as covariate balance, self-weighted sampling and study representativeness? In this paper, we provide answers to this and related questions by analysing the implied individual-level data weights of various linear regression methods. We derive new closed-form expressions of these implied weights, and examine their properties in both finite and large samples. Among others, in finite samples we characterize the implied target population of linear regression, and in large samples demonstrate the multiply robust properties of regression estimators from the perspective of their implied weights. We show that the implied weights of general regression methods can be equivalently obtained by solving a convex optimization problem. This equivalence allows us to bridge ideas from the regression modelling and causal inference literatures. As a result, we propose novel regression diagnostics for causal inference that are part of the design stage of an observational study. We implement the weights and diagnostics in the new lmw package for R.

因果推断线性回归观测研究回归诊断计量经济学

阅读原文 ↗