An interpretable machine learning methodology to generate interaction effect hypotheses from complex datasets
提出一种名为SIFT的新方法,帮助解释机器学习模型,识别变量间的交互效应,提升模型透明度和对数据生成过程的理解。
Abstract Machine learning (ML) models are increasingly being used in decision‐making, but they can be difficult to understand because most ML models are black boxes, meaning that their inner workings are not transparent. This can make interpreting the results of ML models and understanding the underlying data‐generation process (DGP) challenging. In this article, we propose a novel methodology called Simple Interaction Finding Technique (SIFT) that can help make ML models more interpretable. SIFT is a data‐ and model‐agnostic approach that can be used to identify interaction effects between variables in a dataset. This can help improve our understanding of the DGP and make ML models more transparent and explainable to a wider audience. We test the proposed methodology against various factors (such as ML model complexity, dataset noise, spurious variables, and variable distributions) to assess its effectiveness and weaknesses. We show that the methodology is robust against many potential problems in the underlying dataset as well as ML algorithms.