Comparing Sequential Forecasters
提出灵活方法对两个预测者的平均得分差异进行序列推断,构建置信序列和e过程,允许持续监控并随时停止实验,无需平稳性假设,适用于体育和天气预测比较。
Anytime Valid Comparison of Sequential Forecasters How do we compare forecasters that each make a probabilistic prediction on a sequence of events (e.g., weather and sports)? In “Comparing Sequential Forecasters,” Choe and Ramdas propose flexible approaches to sequential inference on the mean score difference between any two forecasters. To estimate this time-varying quantity, the authors propose a sequential analog of confidence intervals, called confidence sequences (CSs). These CSs correctly cover the score difference under continuous monitoring, and the evaluator can freely peek at the scores to stop the experiment (“anytime valid”). The authors further develop a complementary anytime valid approach called e-processes, which quantify the evidence against the claim that one forecaster is never better than the other in mean scores. The validity of these methods does not depend on stationarity or other assumptions on how the score differences evolve sequentially. In their paper, the authors showcase CSs and e-processes for comparing real-world baseball and weather forecasters.