Inference on Regressions with Interval Data on a Regressor or Outcome
研究当某个变量只能观测到区间范围(如收入区间)时,如何对回归模型进行推断。在单调性和均值独立假设下,推导出非参数边界,并提出了两种估计方法(修正最大得分法和修正最小距离法),通过蒙特卡洛模拟和实证案例(健康与退休调查、当前人口调查)验证了方法的有效性。
This paper examines inference on regressions when interval data are available on one variable, the other variables being measured precisely. Let a population be characterized by a distribution P(y, x, v, v0, v1), where y∈R1, x∈Rk, and the real variables (v, v0, v1) satisfy v0≤v≤v1. Let a random sample be drawn from P and the realizations of (y, x, v0, v1) be observed, but not those of v. The problem of interest may be to infer E(y|x, v) or E(v|x). This analysis maintains Interval (I), Monotonicity (M), and Mean Independence (MI) assumptions: (I) P(v0≤v≤v1)=1; (M) E(y|x, v) is monotone in v; (MI) E(y|x, v, v0, v1)=E(y|x, v). No restrictions are imposed on the distribution of the unobserved values of v within the observed intervals [v0, v1]. It is found that the IMMI Assumptions alone imply simple nonparametric bounds on E(y|x, v) and E(v|x). These assumptions invoked when y is binary and combined with a semiparametric binary regression model yield an identification region for the parameters that may be estimated consistently by a modified maximum score (MMS) method. The IMMI assumptions combined with a parametric model for E(y|x, v) or E(v|x) yield an identification region that may be estimated consistently by a modified minimum-distance (MMD) method. Monte Carlo methods are used to characterize the finite-sample performance of these estimators. Empirical case studies are performed using interval wealth data in the Health and Retirement Study and interval income data in the Current Population Survey.