轻松应对：绩效评估文本的现成模型与微调监督建模比较

Taking It Easy: Off-the-Shelf Versus Fine-Tuned Supervised Modeling of Performance Appraisal Text

ORGANIZATIONAL RESEARCH METHODS · 2024

被引 16

人大 A-ABS 4

Andrew B. Speer 通讯
James Perrotta · 韦恩州立大学
Tobias L. Kordsmeyer · 哥廷根大学

中文导读

研究比较了监督NLP模型与现成大语言模型（ChatGPT-3.5/4）在绩效评估文本评分中的效度，发现现成模型虽略逊但心理测量属性相似，并提供了使用指南。

Abstract

When assessing text, supervised natural language processing (NLP) models have traditionally been used to measure targeted constructs in the organizational sciences. However, these models require significant resources to develop. Emerging “off-the-shelf” large language models (LLM) offer a way to evaluate organizational constructs without building customized models. However, it is unclear whether off-the-shelf LLMs accurately score organizational constructs and what evidence is necessary to infer validity. In this study, we compared the validity of supervised NLP models to off-the-shelf LLM models (ChatGPT-3.5 and ChatGPT-4). Across six organizational datasets and thousands of comments, we found that supervised NLP produced scores were more reliable than human coders. However, and even though not specifically developed for this purpose, we found that off-the-shelf LLMs produce similar psychometric properties as supervised models, though with slightly less favorable psychometric properties. We connect these findings to broader validation considerations and present a decision chart to guide researchers and practitioners on how they can use off-the-shelf LLM models to score targeted constructs, including guidance on how psychometric evidence can be “transported” to new contexts.

组织科学自然语言处理心理测量学绩效评估

阅读原文 ↗