人类与生成式人工智能看法一致吗？供应商评估中大语言模型评分波动性的启示

Do Humans and GAI See Eye to Eye? Implications of LLM Scoring Volatility in Supplier Evaluations

JOURNAL OF BUSINESS LOGISTICS · 2026

被引 0 · 同刊同年前 8%

人大 A-ABS 3

Finnegan A. McKinley · 科罗拉多州立大学
Anne E. Dohmen · 密歇根州立大学通讯
Vincent E. Castillo · 俄亥俄州立大学

中文导读

研究对比生成式AI与人类专家在政府供应商评估中的表现，发现AI在合规性评估上稳定且与人类一致，但在竞争性信号评估中波动大，需人类介入。

Abstract

ABSTRACT This study compares Generative Artificial Intelligence (GAI) to human procurement professionals on supplier evaluation tasks. Using Structural Topic Modeling (STM) on 123 government supplier bids from 31 projects solicited by the State of Ohio between January 2023 and December 2024, we compare evaluations from three reasoning models (o3, Grok‐3‐Mini, DeepSeek R1‐0528) against human evaluators. Adopting a signaling theory perspective, we find asymmetry in signal processing between GAI and human evaluators. GAI demonstrates high consistency and strong human alignment when evaluating compliance signals (e.g., technical specifications), which makes it suitable for qualification screening. However, GAI exhibits high scoring volatility with competitive signals (e.g., value‐add propositions), indicating that human judgment remains critical for assessing differentiation. We also find that the number of bidders influences signal composition, with compliance signals more prevalent in less competitive solicitations. The findings suggest a two‐stage evaluation framework where GAI handles compliance screening and humans focus on competitive assessment. GAI scoring volatility serves as a canary‐in‐the‐mine to identify when human oversight is necessary.

采购政府招标人工智能应用供应商评估

阅读原文 ↗