AI的“真实标准”真的真实吗？基于专家“知道什么”来训练和评估AI工具的危险

Is AI Ground Truth Really True? The Dangers of Training and Evaluating AI Tools Based on Experts’ Know-What

MIS Quarterly · 2021

被引 49

人大 A+FT50UTD24ABS 4*

Natalia Levina · 纽约大学
Sarah Lebovitz · 弗吉尼亚大学
Hila Lifshitz‐Assaf · 纽约大学

中文导读

通过一家美国医院实地研究，发现基于专家标注的“真实标准”训练的AI工具在实际中表现不佳，因为忽略了专家的隐性实践知识，提醒管理者警惕AI评估的陷阱。

Abstract

Organizational decision-makers need to evaluate AI tools in light of increasing claims that such tools outperform human experts. Yet, measuring the quality of knowledge work is challenging, raising the question of how to evaluate AI performance in such contexts. We investigate this question through a field study of a major U.S. hospital, observing how managers evaluated five different machine-learning (ML) based AI tools. Each tool reported high performance according to standard AI accuracy measures, which were based on ground truth labels provided by qualified experts. Trying these tools out in practice, however, revealed that none of them met expectations. Searching for explanations, managers began confronting the high uncertainty of experts’ know-what knowledge captured in ground truth labels used to train and validate ML models. In practice, experts address this uncertainty by drawing on rich know-how practices, which were not incorporated into these ML-based tools. Discovering the disconnect between AI’s know-what and experts’ know-how enabled managers to better understand the risks and benefits of each tool. This study shows dangers of treating ground truth labels used in ML models objectively when the underlying knowledge is uncertain. We outline implications of our study for developing, training, and evaluating AI for knowledge work.

人工智能知识管理组织决策机器学习评估医疗信息学

阅读原文 ↗