管理AI系统中的信任:基于数据不平衡下解释稳定性受控评估的证据

Managing Trust in AI Systems: Evidence From Controlled Evaluation of Explanation Stability Under Data Imbalance

IEEE Transactions on Engineering Management · 2026
被引 0
ABS 3

中文导读

研究提出一种可解释AI框架,通过受控采样和稳定性评估,揭示类别不平衡如何影响SHAP和LIME等解释方法的可靠性,帮助管理者判断AI输出的可信度。

Abstract

As advanced AI models are increasingly embedded in information systems to support organizational decision-making, the demand for explainable and trustworthy systems has grown significantly. While post-hoc explanation methods like SHAP and LIME are widely adopted to enhance interpretability and foster user trust, they often suffer from limitations such as instability in feature attributions, sensitivity to data sampling, and inconsistent outputs across iterations. This study proposes an explainable AI (XAI) artifact guided by the design science research (DSR) approach that systematically investigates the impact of class imbalance and sampling artifacts on the stability of feature explanations. Drawing on data from 2013–2023 across four integrated social media sources (BoxOfficeMojo, YouTube reviews, movie budget records, and metadata), we develop and evaluate a controlled sampling framework. The artifact incorporates (1) targeted class sampling to evaluate imbalance effects, (2) stability assessment using sequential rank agreement and coefficient of variation, and (3) robustness checks through variable stability index and coefficient stability index. Our findings highlight key factors that influence the reliability of SHAP and LIME outputs, based on 1,600 controlled simulation runs, multi-metric robustness checks (SRA, CV, VSI, CSI), and a design science–driven evaluation framework that explicitly isolates the effects of class imbalance. It also advances the development of responsible AI by enhancing the methodological rigor of explainability in socio-technical systems.

人工智能可解释性数据不平衡信任管理设计科学研究