海量数据分析中双样本U统计量的分布式推断

Distributed inference for two‐sample U‐statistics in massive data analysis

Scandinavian Journal of Statistics · 2022

被引 7

ABS 3

Bingyao Huang
Yanyan Liu
Liuhua Peng

中文导读

针对海量数据场景，提出了两种分布式双样本U统计量及其自助法推断方法，降低了计算复杂度，适用于数据分布存储的分布式计算平台。

Abstract

Abstract This paper considers distributed inference for two‐sample U ‐statistics under the massive data setting. In order to reduce the computational complexity, this paper proposes distributed two‐sample U ‐statistics and blockwise linear two‐sample U ‐statistics. The blockwise linear two‐sample U ‐statistic, which requires less communication cost, is more computationally efficient especially when the data are stored in different locations. The asymptotic properties of both types of distributed two‐sample U ‐statistics are established. In addition, this paper proposes bootstrap algorithms to approximate the distributions of distributed two‐sample U ‐statistics and blockwise linear two‐sample U ‐statistics for both nondegenerate and degenerate cases. The distributed weighted bootstrap for the distributed two‐sample U ‐statistic is new in the literature. The proposed bootstrap procedures are computationally efficient and are suitable for distributed computing platforms with theoretical guarantees. Extensive numerical studies illustrate that the proposed distributed approaches are feasible and effective.

海量数据分析分布式推断双样本U统计量自助法

阅读原文 ↗