半臂老虎机下的匹配问题

Matching with semi-bandits

Econometrics Journal · 2022

被引 4

人大 BABS 3

Maximilian Kasy · 牛津大学通讯
Alexander Teytelboym · 牛津大学通讯

中文导读

研究了在重复匹配中学习未知回报的算法，提出改进的汤普森采样方法，给出有限样本遗憾界，并在难民安置模拟中验证其有效性。

Abstract

Summary We consider an experimental setting in which a matching of resources to participants has to be chosen repeatedly and returns from the individual chosen matches are unknown, but can be learned. Our setting covers two-sided and one-sided matching with (potentially complex) capacity constraints, such as refugee resettlement, social housing allocation, and foster care. We propose a variant of the Thompson sampling algorithm to solve such adaptive combinatorial allocation problems. We give a tight, prior-independent, finite-sample bound on the expected regret for this algorithm. Although the number of allocations grows exponentially in the number of matches, our bound does not. In simulations based on refugee resettlement data using a Bayesian hierarchical model, we find that the algorithm achieves half of the employment gains (relative to the status quo) that could be obtained in an optimal matching based on perfect knowledge of employment probabilities.

匹配理论在线学习资源分配贝叶斯方法实验设计

阅读原文 ↗