🌙

二分匹配中比较方法与图方法的联系

Linking the Comparison and Graphical Approaches to Bipartite Matching

International Statistical Review · 2026
被引 0
ABS 3

中文导读

研究了二分记录链接中Fellegi-Sunter模型与图模型两种方法的相似性,发现它们可统一为潜在二元矩阵框架,并提出了基于分类期望最大化算法的统一估计方法,在模拟和真实数据上表现良好。

Abstract

Summary Bipartite record linkage has the goal of identifying observations referring to the same individual, called coreferent observations, across two distinct non‐duplicated datasets. The two main approaches to solve this task are the Fellegi–Sunter model, which relies on pairwise comparisons of observations, and the graphical record linkage model, which directly models the data and groups together coreferent observations. In this paper, we aim to investigate the similarities between these two methods. We show that both models can be expressed in terms of a latent binary matrix indicating coreferent record pairs, that they can be framed as particular latent class analysis models and that they admit a direct relationship between their parameters under a common data model. Moreover, we propose a unified estimation framework based on a classification expectation–maximization algorithm. The proposed estimation method properly incorporates the problem constraints, while still allowing for a computationally efficient implementation. Moreover, it allows for an interchangeable use of the same distributional assumptions on the linkage distribution between the two models. Empirical results using the proposed estimation method demonstrate satisfactory and mostly equivalent performance for two models both on simulations and on a real dataset commonly used as a benchmark for record linkage.

记录链接二分图匹配统计模型数据匹配