🌙

关于小期望单元格频数表使用卡方统计量的警告

A Warning on the Use of Chi-Squared Statistics With Frequency Tables With Small Expected Cell Counts

Journal of the American Statistical Association · 1988
被引 19
ABS 4

中文导读

研究发现,当频数表期望单元格计数较小时,皮尔逊卡方检验可能产生渐近不一致性,即使零假设下卡方近似良好,检验功效也可能低于显著性水平,尤其当单元格数量多且期望计数差异大时。

Abstract

Abstract Abstract When applied to frequency tables with small expected cell counts, Pearson chi-squared test statistics may be asymptotically inconsistent even in cases in which a satisfactory chi-squared approximation exists for the distribution under the null hypothesis. This problem is particularly important in cases in which the number of cells is large and the expected cell counts are quite variable. To illustrate this bias of the chi-squared test, this article considers the Pearson chi-squared test of the hypothesis that the cell probabilities for a multinomial frequency table have specified values. In this case, the expected value and variance of the Pearson chi-square may be evaluated under both the null and alternative hypotheses. When the number of cells is large, normal approximations and discrete Edgeworth expansions may also be used to assess the size and power of the Pearson chi-squared test. These analyses show that unless all cell probabilities are equal, it is possible to select a significance level and cell probabilities under the alternative hypothesis such that the power is less than the size of the test. As shown by exact calculations, the difference may be substantial even in cases in which all expected cell sizes are at least 5 under the null hypothesis. The use of moments shows that given any minimum expected cell size under the null hypothesis and given any significance level, it is possible to make the power arbitrarily close to 0 by the selection of a large enough number of cells in the table and suitable cell probabilities for the null and alternative hypotheses. The normal approximations for the distribution of the Pearson chi-squared statistic permit the size of this bias to be assessed in less-extreme cases involving tables with many cells. These results imply that caution must be exercised in the application of Pearson chi-squared statistics to sparse contingency tables with many cells. An alternative to the Pearson chi-square, proposed by Zelterman (1986), avoids some of the problems. Exact calculation, however, shows that the alternative statistic does not eliminate all problems of bias. The problems described in this article clearly extend to more general applications of the Pearson chi-squared statistic. Key Words: Asymptotic biasConsistency

统计学假设检验卡方检验列联表分析