🌙

利用充分统计量的广义数据稀疏化

Generalized Data Thinning Using Sufficient Statistics

Journal of the American Statistical Association · 2024
被引 16 · 同刊同年前 4%
ABS 4

中文导读

提出一种通用策略,将随机变量分解为多个独立变量而不损失参数信息,扩展了数据稀疏化的适用范围,并统一了样本分割与数据稀疏化方法。

Abstract

Our goal is to develop a general strategy to decompose a random variable X into multiple independent random variables, without sacrificing any information about unknown parameters. A recent paper showed that for some well-known natural exponential families, X can be thinned into independent random variables X(1),…,X(K), such that X=∑k=1KX(k). These independent random variables can then be used for various model validation and inference tasks, including in contexts where traditional sample splitting fails. In this paper, we generalize their procedure by relaxing this summation requirement and simply asking that some known function of the independent random variables exactly reconstruct X. This generalization of the procedure serves two purposes. First, it greatly expands the families of distributions for which thinning can be performed. Second, it unifies sample splitting and data thinning, which on the surface seem to be very different, as applications of the same principle. This shared principle is sufficiency. We use this insight to perform generalized thinning operations for a diverse set of families.

统计学数据科学推断统计模型验证