基于等级反应模型的两个项目功能差异指标的功效与第一类错误检验

An Examination of Power and Type I Errors for Two Differential Item Functioning Indices Using the Graded Response Model

ORGANIZATIONAL RESEARCH METHODS · 2011
被引 21
人大 A-ABS 4

中文导读

通过蒙特卡洛模拟比较了DFIT和似然比检验两种方法在检测项目功能差异时的第一类错误率和统计功效,发现样本量不等时DFIT第一类错误过多,而LRT整体检测功效更高。

Abstract

This study examined two methods for detecting differential item functioning (DIF): Raju, van der Linden, and Fleer’s 1995 differential functioning of items and tests (DFIT) procedure and Thissen, Steinberg, and Wainer’s 1988 likelihood ratio test (LRT). The major research questions concerned which test provides the best balance of Type I errors and power and if the tests differ in terms of detecting different types of DIF. Monte Carlo simulations were conducted to address these questions. Equal and unequal sample size conditions were fully crossed with test lengths of 10 and 20 items. In addition, α and β parameters were manipulated in order to simulate DIF. Findings indicate that DFIT and LRT both had acceptable Type I error rates when sample sizes were equal but that DFIT produced too many Type I errors when sample sizes were unequal. Overall, the LRT exhibited greater power to detect both α and β parameter DIF than did DFIT. However, DFIT was more powerful than LRT when the last two β parameters had DIF as opposed to when the extreme β parameters had DIF.

心理测量学项目反应理论统计检验教育测量