🌙

保持度分布的网络抽样:以关系学习为例

Degree Distribution Preserving Network Sampling: The Case of Relational Learning

INFORMS journal on computing · 2025
被引 0
人大 BUTD24ABS 3

中文导读

研究了从无向网络中抽样并保持其相对度分布的问题,将其转化为最小化原始网络与样本间K-S距离的优化问题,并开发了有效启发式算法,在关系学习分类任务中优于现有方法。

Abstract

Networks are relevant to many business applications, and the benefits of analyzing networks are recognized by researchers and practitioners alike. When networks become impractically large, analyzing samples drawn from them is the only way to get insights. The need for effective approaches to generate samples from large networks is also well documented as are the difficulties associated with doing so. The distribution associated with the degrees of nodes in a network is known to affect many phenomena of interest directly or indirectly and is recognized as one of the fundamental properties of networks. In this paper, we consider the problem of sampling from an undirected network and preserving its relative degree distribution. We formulate it as an optimization problem that minimizes the Kolmogorov–Smirnov distance between the distributions of the original network and the sample. We identify conditions involving the neighborhoods of nodes that result in a perfect sample. Whereas perfect samples are unlikely to exist in practice, these conditions provide intuition that we leverage to develop effective heuristics. We illustrate the effectiveness of the proposed heuristics through two categories of computational experiments. The first set of experiments shows that the proposed approaches preserve degree distributions better than state-of-the-art approaches available for sampling. In the second set of experiments, we evaluate the benefits of our approach on relational learning–based classification tasks. These experiments show that samples obtained using our approach outperform those derived using existing approaches and that our approach is scalable. History: Accepted Ram Ramesh, Area Editor for Data Science & Machine Learning. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2024.1002 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2024.1002 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .

网络分析图抽样关系学习优化方法