最小二乘回归中缺失数据的诊断图

Diagnostic Plots for Missing Data in Least Squares Regression

Journal of the American Statistical Association · 1986
被引 4
ABS 4

中文导读

针对回归中一个自变量缺失的情况,推导出最小二乘系数估计及其t统计量的取值范围,并据此生成诊断图,帮助研究者评估缺失点对回归的潜在影响,无需假设数据随机缺失。

Abstract

The usual approach to handling missing data in a regression is to assume that the points are missing at random (MAR) and use either a fill-in method to replace the missing points or a method using maximally available pairs in the sample covariance matrix. We derive limits for the values of the least squares estimates of the coefficients (and their associated t statistics) when there are missing observations in one carrier. These limits are derived subject to a constraint on the relationship of the missing data to the present data. Calculating these limits while varying this constrained value results in a series of diagnostic plots that can be used to study the potential effect of the missing points on the regression (without assuming that the points are MAR). Simulations are performed to illustrate the use of the plots, and two real data sets are analyzed. The more general case of missing data in more than one carrier is also discussed.

统计学回归分析缺失数据处理