基于低秩结构的多领域泛化

Multi-Dimensional Domain Generalization with Low-Rank Structures

Journal of the American Statistical Association · 2025

被引 1

ABS 4

Sai Li 通讯
Linjun Zhang

中文导读

针对训练数据中某些亚群样本不足导致统计推断困难的问题，提出一种将各亚群模型参数组织成张量、通过结构化张量补全实现稳健领域泛化的新方法，并在糖尿病预测数据上验证了有效性。

Abstract

In conventional statistical and machine learning methods, it is typically assumed that the test data are identically distributed with the training data. However, this assumption does not always hold, especially in applications where the target population are not well-represented in the training data. This is a notable issue in health-related studies, where specific ethnic populations may be underrepresented, posing a significant challenge for researchers aiming to make statistical inferences about these minority groups. In this work, we present a novel approach to addressing this challenge in linear regression models. We organize the model parameters for all the sub-populations into a tensor. By studying a structured tensor completion problem, we can achieve robust domain generalization, i.e., learning about sub-populations with limited or no available data. Our method novelly leverages the structure of group labels and it can produce more reliable and interpretable generalization results. We establish rigorous theoretical guarantees for the proposed method and demonstrate its minimax optimality. To validate the effectiveness of our approach, we conduct extensive numerical experiments and a real data study focused on diabetes prediction for multiple subgroups, comparing our results with those obtained using other existing methods.

机器学习统计推断线性回归领域泛化健康研究

阅读原文 ↗