解构基于微调大语言模型的开放式人格评估的有效性

Unpacking the Validity of Open-Ended Personality Assessments Using Fine-Tuned Large Language Models

ORGANIZATIONAL RESEARCH METHODS · 2026

被引 0

人大 A-ABS 4

Andrew B. Speer 通讯
Angie Y. Delacruz · 韦恩州立大学
Takudzwa Chawota
James Perrotta · 韦恩州立大学
Cort W. Rudolph · 韦恩州立大学

中文导读

研究使用大样本开放式评估数据，发现微调大语言模型比零样本模型更能提升人格评分与自评的一致性，且提示设计越具体、模型越复杂，效度越高。

Abstract

Alternative approaches to personality measurement, such as open-ended narrative-based assessments, have potential advantages for organizational research and practice. In this research, we investigate factors that affect valid application of natural language processing (NLP) for scoring open-ended personality assessments and when, how, and why such assessments capture personality-related variance. Using a large sample of responses to open-ended assessments, convergence between NLP scores and self-report target scores increased as the degree of customization and the sophistication of the underlying model increased, with the worst psychometric performance occurring for zero-shot large language model (LLM) scores and the best for fine-tuned LLM scores. However, all scoring methods exhibited evidence of validity. Additionally, when trained to predict direct evaluations of the narrative responses, correlations with target scores were large ( M = .83). NLP scores also exhibited discriminant and criterion-related validity evidence. However, validity was contingent upon the methodological rigor employed in developing writing prompts. Prompts designed to elicit trait-relevant information outperformed generic prompts, and this occurred because trait-specific prompts increased the amount of trait-relevant information (i.e., narrative units), which was associated with enhanced convergence with target scores.

人格测量自然语言处理组织研究心理测量学

阅读原文 ↗