🌙

高维数据的新范式:基于主体间属性的距离基半参数特征聚合框架

A new paradigm for high‐dimensional data: Distance‐based semiparametric feature aggregation framework via between‐subject attributes

Scandinavian Journal of Statistics · 2023
被引 5
ABS 3

中文导读

提出一种不依赖稀疏特征假设的距离基半参数回归框架,通过主体间属性对聚合高维变量,避免特征选择的信息损失,提升可解释性和计算可行性,适用于微生物组等超高维数据。

Abstract

This article proposes a distance-based framework incentivized by the paradigm shift towards feature aggregation for high-dimensional data, which does not rely on the sparse-feature assumption or the permutation-based inference. Focusing on distance-based outcomes that preserve information without truncating any features, a class of semiparametric regression has been developed, which encapsulates multiple sources of high-dimensional variables using pairwise outcomes of between-subject attributes. Further, we propose a strategy to address the interlocking correlations among pairs via the U-statistics-based estimating equations (UGEE), which correspond to their unique efficient influence function (EIF). Hence, the resulting semiparametric estimators are robust to distributional misspecification while enjoying root-n consistency and asymptotic optimality to facilitate inference. In essence, the proposed approach not only circumvents information loss due to feature selection but also improves the model's interpretability and computational feasibility. Simulation studies and applications to the human microbiome and wearables data are provided, where the feature dimensions are tens of thousands.

高维数据分析半参数回归特征聚合统计推断