The Power of Tests for Detecting p -Hacking
从理论上研究了p值操纵的常见形式对p值分布的影响,分析了检测p值操纵的检验功效,发现功效取决于操纵策略和真实效应分布,联合检验和连续性检验功效最高。
Abstract A flourishing empirical literature investigates the prevalence of p-hacking based on the distribution of p-values across studies. Interpreting results in this literature requires a careful understanding of the power of methods for detecting p-hacking. We theoretically study the implications of likely forms of p-hacking on the distribution of p-values to understand the power of tests for detecting it. Power can be low and depends crucially on the p-hacking strategy and the distribution of true effects. Combined tests for upper bounds and monotonicity and tests for continuity of the p-curve tend to have the highest power for detecting p-hacking.