始终有效的推断：A/B测试的持续监控

Always Valid Inference: Continuous Monitoring of A/B Tests

Operations Research · 2021

被引 49 · 同刊同年前 8%

人大 AFT50UTD24ABS 4*

Ramesh Johari · 斯坦福大学
Pete Koomen
Leonid Pekelis
David Walsh

中文导读

针对用户持续监控A/B测试并自行决定样本量导致传统推断不可靠的问题，提出了始终有效的p值和置信区间，允许用户随时利用数据做出有效统计推断，并在大规模商业平台中应用。

Abstract

A/B tests are typically analyzed via frequentist p-values and confidence intervals, but these inferences are wholly unreliable if users endogenously choose samples sizes by continuously monitoring their tests. We define always valid p-values and confidence intervals that let users try to take advantage of data as fast as it becomes available, providing valid statistical inference whenever they make their decision. Always valid inference can be interpreted as a natural interface for a sequential hypothesis test, which empowers users to implement a modified test tailored to them. In particular, we show in an appropriate sense that the measures we develop trade off sample size and power efficiently, despite a lack of prior knowledge of the user’s relative preference between these two goals. We also use always valid p-values to obtain multiple hypothesis testing control in the sequential context. Our methodology has been implemented in a large-scale commercial A/B testing platform to analyze hundreds of thousands of experiments to date.

统计推断A/B测试假设检验机器学习数据挖掘

阅读原文 ↗