高维数据下广义线性模型的在线学习与决策

Online Learning and Decision Making Under Generalized Linear Model with High-Dimensional Data

Management Science · 2024

被引 0

人大 A+FT50UTD24ABS 4*

Tao Yao · 上海交通大学
Xue Wang · 贝尔维尤医院中心
Mike Mingcheng Wei · 布法罗大学

中文导读

提出一种基于极小极大凹惩罚的广义线性模型多臂老虎机算法，在高维数据环境下实现最优累积遗憾，并通过两步加权Lasso方法估计参数，实验表明该算法优于基准方法。

Abstract

We propose a minimax concave penalized multiarmed bandit algorithm under the generalized linear model (G-MCP-Bandit) for decision-makers facing high-dimensional data in an online learning and decision-making environment. We demonstrate that in the data-rich regime, the G-MCP-Bandit algorithm attains the optimal cumulative regret in the sample size dimension and a tight bound in the covariate dimension and the significant covariate dimension. In the data-poor regime, the G-MCP-Bandit algorithm maintains a tight regret upper bound. In addition, we develop a local linear approximation method, the two-step weighted Lasso procedure, to identify the minimax concave penalty (MCP) estimator for the G-MCP-Bandit algorithm when samples are not independent and identically distributed. Under this procedure, the MCP estimator can match the oracle estimator with high probability and converge to the true parameters at the optimal convergence rate. Finally, through experiments based on both synthetic and real data sets, we show that the G-MCP-Bandit algorithm outperforms other benchmarking algorithms in terms of cumulative regret and that the benefits of the G-MCP-Bandit algorithm increase in the data’s sparsity level and the size of the decision set. This paper was accepted by J. George Shanthikumar, big data analytics. Funding: This work was supported by the National Natural Science Foundation of China (NSFC) [Grants W2441021, 72371172, 72342023, and 71929101] and the National Science Foundation (NSF) [Grant DMS 1820702]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.01557 .

广义线性模型高维数据在线学习多臂老虎机

阅读原文 ↗