“数据猴子”:基于部分统计的外推程序模型

“Data Monkeys”: A Procedural Model of Extrapolation from Partial Statistics*

Review of Economic Studies · 2017
被引 15
人大 A+FT50ABS 4*

中文导读

提出了一个数据分析师的行为模型,该分析师从覆盖部分重叠变量的统计数据集外推出完整的概率分布,并分析了这种外推如何扭曲真实数据生成过程的因果结构。

Abstract

I present a behavioural model of a “data analyst” who extrapolates a fully specified probability distribution over observable variables from a collection of statistical data sets that cover partially overlapping sets of variables. The analyst employs an iterative extrapolation procedure, whose individual rounds are akin to the stochastic regression method of imputing missing data. Users of the procedure’s output fail to distinguish between raw and imputed data, and it functions as their practical belief. I characterize the ways in which this belief distorts the correlation structure of the underlying data generating process—focusing on cases in which the distortion can be described as the imposition of a causal model (represented by a directed acyclic graph over observable variables) on the true distribution.

数据外推部分统计缺失数据插补因果模型