🌙

异质性下非参数两样本推断的自举边计数检验

Bootstrapped Edge Count Tests for Nonparametric Two-Sample Inference Under Heterogeneity

Journal of Computational and Graphical Statistics · 2024
被引 0
ABS 3

中文导读

针对数据存在未知异质性时两样本差异检测问题,提出一种基于自举加权的边计数检验方法,通过复合零假设处理潜在子群体,模拟和在线游戏异常行为检测验证了其有效性。

Abstract

Nonparametric two-sample testing is a classical problem in inferential statistics. While modern two-sample tests, such as the edge count test and its variants, can handle multivariate and non-Euclidean data, contemporary gargantuan datasets often exhibit heterogeneity due to the presence of latent subpopulations. Direct application of these tests, without regulating for such heterogeneity, may lead to incorrect statistical decisions. We develop a new nonparametric testing procedure that accurately detects differences between the two samples in the presence of unknown heterogeneity in the data generation process. Our framework handles this latent heterogeneity through a composite null that entertains the possibility that the two samples arise from a mixture distribution with identical component distributions but with possibly different mixing weights. In this regime, we study the asymptotic behavior of weighted edge count test statistic and show that it can be effectively recalibrated to detect arbitrary deviations from the composite null. For practical implementation we propose a Bootstrapped Weighted Edge Count test which involves a bootstrap-based calibration procedure that can be easily implemented across a wide range of heterogeneous regimes. A comprehensive simulation study and an application to detecting aberrant user behaviors in online games demonstrates the excellent non-asymptotic performance of the proposed test. Supplementary materials for this article are available online.

非参数统计两样本检验异质性边计数检验自助法