Reliability Evidence for AI-Based Scores in Organizational Contexts: Applying Lessons Learned From Psychometrics
批判性地审视了组织研究中AI得分的可靠性估计问题,区分了信度与效度证据,并提供了适合AI得分信度估计的方法,对组织研究者与实践者具有重要指导意义。
Machine learning and artificial intelligence (AI) are increasingly used within organizational research and practice to generate scores representing constructs (e.g., social effectiveness) or behaviors/events (e.g., turnover probability). Ensuring the reliability of AI scores is critical in these contexts, and yet reliability estimates are reported in inconsistent ways, if at all. The current article critically examines reliability estimation for AI scores. We describe different uses of AI scores and how this informs the data and model needed for estimating reliability. Additionally, we distinguish between reliability and validity evidence within this context. We also highlight how the parallel test assumption is required when relying on correlations between AI scores and established measures as an index of reliability, and yet this assumption is frequently violated. We then provide methods that are appropriate for reliability estimation for AI scores that are sensitive to the generalizations one aims to make. In conclusion, we assert that AI reliability estimation is a challenging task that requires a thorough understanding of the issues presented, but a task that is essential to responsible AI work in organizational contexts.