Critical Values Robust to P-hacking
针对现实中普遍存在的P值操纵行为,构建了一个包含该行为的假设检验模型,推导出能避免虚假显著结果比预期更频繁出现的稳健临界值,该值大于经典临界值,在医学校准模型中为经典临界值但显著性水平降至五分之一。
Abstract P-hacking is prevalent in reality but absent from classical hypothesis-testing theory. We therefore build a model of hypothesis testing that accounts for p-hacking. From the model, we derive critical values such that, if they are used to determine significance, and if p-hacking adjusts to the new significance standards, spurious significant results do not occur more often than intended. Because of p-hacking, such robust critical values are larger than classical critical values. In the model calibrated to medical science, the robust critical value is the classical critical value for the same test statistic but with one fifth of the significance level.