Reproducible Hyperparameter Optimization
针对机器学习超参数优化中因模型训练随机性导致结果波动大、难以复现的问题,提出以多次训练的平均预测误差为目标,结合假设检验和序贯检验框架,在同等计算量下将结果变异降低90%,且不牺牲预测精度。
A key issue in machine learning research is the lack of reproducibility. We illustrate what role hyperparameter search plays in this problem and how regular hyperparameter search methods can lead to a large variance in outcomes due to nondeterministic model training during hyperparameter optimization. The variation in outcomes poses a problem both for reproducibility of the hyperparameter search itself and comparisons of different methods each optimized using hyperparameter search. In addition, the fact that hyperparameter search may result in nonoptimal hyperparameter settings may affect other studies, since hyperparameter settings are often copied from previously published research. To remedy this issue, we define the mean prediction error across model training runs as the objective for the hyperparameter search. We then propose a hypothesis testing procedure that makes inference on the mean performance of each hyperparameter setting and results in an equivalence class of hyperparameter settings that are not distinguishable in performance. We further embed this procedure into a group sequential testing framework to increase efficiency in terms of the average number of model training replicates required. Empirical results on machine learning benchmarks show that at equal computation the proposed method reduces the variation in hyperparameter search outcomes by up to 90% while resulting in equal or lower mean prediction errors when compared to standard random search and Bayesian optimization. Moreover, the sequential testing framework successfully reduces computation while preserving performance of the method. The code to reproduce the results is available online and in the supplementary materials.