🌙

网络上的系数异质性学习:一种基于分布式生成树的融合套索回归

Learning Coefficient Heterogeneity over Networks: A Distributed Spanning-Tree-Based Fused-Lasso Regression

Journal of the American Statistical Association · 2022
被引 25
ABS 4

中文导读

针对分布式网络中数据分散存储的问题,提出一种基于生成树的融合套索回归方法,在提高回归估计效率的同时识别节点的聚类归属,并通过分布式算法实现并行计算。

Abstract

Identifying the latent cluster structure based on model heterogeneity is a fundamental but challenging task arises in many machine learning applications. In this article, we study the clustered coefficient regression problem in the distributed network systems, where the data are locally collected and held by nodes. Our work aims to improve the regression estimation efficiency by aggregating the neighbors’ information while also identifying the cluster membership for nodes. To achieve efficient estimation and clustering, we develop a distributed spanning-tree-based fused-lasso regression (DTFLR) approach. In particular, we propose an adaptive spanning-tree-based fusion penalty for the low-complexity clustered coefficient regression. We show that our proposed estimator satisfies statistical oracle properties. Additionally, to solve the problem parallelly, we design a distributed generalized alternating direction method of multiplier algorithm, which has a simple node-based implementation scheme and enjoys a linear convergence rate. Collectively, our results in this article contribute to the theories of low-complexity clustered coefficient regression and distributed optimization over networks. Thorough numerical experiments and real-world data analysis are conducted to verify our theoretical results, which show that our approach outperforms existing works in terms of estimation accuracy, computation speed, and communication costs. Supplementary materials for this article are available online.

机器学习分布式系统回归分析聚类分析网络优化