数据科学与图形因果建模时代的统计建模与推断

Statistical modeling and inference in the era of Data Science and Graphical Causal modeling

Journal of Economic Surveys · 2021
被引 3
人大 AABS 2

中文导读

比较了自1920年代以来统计学的四次范式转变,重点评估了Fisher的模型基础方法、非参数方法、数据科学和图形因果建模在从数据中学习方面的有效性,强调统计充分性而非拟合优度。

Abstract

Abstract The paper discusses four paradigm shifts in statistics since the 1920s with a view to compare their similarities and differences, and evaluate their effectiveness in giving rise to ‘learning from data' about phenomena of interest. The first is Fisher's 1922 recasting of Karl Pearson's descriptive statistics into a model‐based [ ] statistical induction that dominates current statistics (frequentist and Bayesian). A crucial departure was Fisher's replacing the curve‐fitting perspective guided by goodness‐of‐fit measures with a model‐based perspective guided by the statistical adequacy : the validity of the probabilistic assumptions comprising . Statistical adequacy is pivotal in securing trustworthy evidence since it underwrites the reliability of inference. The second is the nonparametric turn in the 1970s aiming to broaden by replacing its distribution assumption with weaker mathematical conditions relating to the unknown density function underlying . The third is a two‐pronged development initiated in Artificial Intelligence (AI) in the 1990s that gave rise to Data Science (DS) and Graphical Causal (GC) modeling. The primary objective of the paper is to compare and evaluate the other competing approaches with a refined/enhanced version of Fisher's model‐based approach in terms of their effectiveness in giving rise to genuine “learning from data;” excellent goodness‐of‐fit/prediction is neither necessary nor sufficient for statistical adequacy, or so it is argued.

统计推断模型充分性数据科学图形因果建模