Channeling Fisher: Randomization Tests and the Statistical Insignificance of Seemingly Significant Experimental Results*
使用随机化统计推断方法,检验了美国经济学会期刊中53篇实验论文的处理效应,发现随机化检验比作者所用方法报告了更少的显著结果,在联合检验中差异更大。
I follow R. A. Fisher'sThe Design of Experiments (1935), using randomization statistical inference to test the null hypothesis of no treatment effects in a comprehensive sample of 53 experimental papers drawn from the journals of the American Economic Association. In the average paper, randomization tests of the significance of individual treatment effects find 13% to 22% fewer significant results than are found using authors’ methods. In joint tests of multiple treatment effects appearing together in tables, randomization tests yield 33% to 49% fewer statistically significant results than conventional tests. Bootstrap and jackknife methods support and confirm the randomization results.