A hybrid clustering and boosting tree feature selection (CBTFS) method for credit risk assessment with high-dimensionality
提出一种混合聚类与提升树的特征选择方法,先用改进的最小生成树去除冗余特征,再用随机森林、XGBoost和AdaBoost进一步排序,在真实信用数据集上验证了该方法优于经典特征选择方法。
To solve the high-dimensional issue in credit risk assessment, a hybrid clustering and boosting tree feature selection method is proposed. In the hybrid methodology, an improved minimum spanning tree model is first used to remove redundant and irrelevant features. Then three embedded feature selection approaches (i.e., Random Forest, XGBoost, and AdaBoost) are used to further enhance the feature-ranking efficiency and obtain better prediction performance by applying the optimal features. For verification purpose, two real-world credit datasets are used to demonstrate the effectiveness of the proposed hybrid clustering and boosting tree feature selection (CBTFS) methodology. Experimental results demonstrated that the proposed method is superior to others classic feature selection methods. This indicates that the proposed hybrid clustering and boosting tree feature selection method can be used as a promising tool for solving high-dimensional issue in credit risk assessment. First published online 12 February 2025