🌙

降维森林:使用结构化随机森林的局部变量重要性

Dimension Reduction Forests: Local Variable Importance Using Structured Random Forests

Journal of Computational and Graphical Statistics · 2022
被引 6
ABS 3

中文导读

提出一种新非参数估计方法,将随机森林核与局部充分降维结合,估计每个预测点的有方向局部变量重要性,在模拟和真实回归问题中精度优于现有方法,并应用于北京颗粒物浓度数据分析。

Abstract

Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random forests only provide variable importance in a global sense. There is an increasing need for such assessments at a local level, motivated by applications in personalized medicine, policy-making, and bioinformatics. We propose a new nonparametric estimator that pairs the flexible random forest kernel with local sufficient dimension reduction to adapt to a regression function’s local structure. This allows us to estimate a meaningful directional local variable importance measure at each prediction point. We develop a computationally efficient fitting procedure and provide sufficient conditions for the recovery of the splitting directions. We demonstrate significant accuracy gains of our proposed estimator over competing methods on simulated and real regression problems. Finally, we apply the proposed method to seasonal particulate matter concentration data collected in Beijing, China, which yields meaningful local importance measures. The methods presented here are available in the drforest Python package. Supplementary materials for this article are available online.

随机森林非参数统计降维变量重要性机器学习