Interpretable Machine Learning Using Partial Linear Models*
针对随机森林和梯度提升等黑箱模型缺乏可解释性的问题,提出用部分线性模型结合参数与非参数函数来捕捉线性和非线性关系,并通过变量选择控制过拟合,在回归问题中展示了预测性能和可解释性。
Abstract Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes which has raised concerns from practitioners and regulators. As an alternative, we suggest using partial linear models that are inherently interpretable. Specifically, we propose to combine parametric and non‐parametric functions to accurately capture linearities and non‐linearities prevailing between dependent and explanatory variables, and a variable selection procedure to control for overfitting issues. Estimation relies on a two‐step procedure building upon the double residual method. We illustrate the predictive performance and interpretability of our approach on a regression problem.