An interpretable labeling model for reject inference based on multi-level sub-model migration in the credit risk assessment scenario
针对贷款机构未批准样本缺乏标签导致的选择偏差问题,提出LM-IMM模型,通过多级可解释子模型迁移为拒绝样本标注,在保持可解释性的同时实现高标注准确率,并显著提升后续信用风险评估模型性能。
The ‘rejected loan' samples, which the loan institution has not approved, often contain a substantial amount of credit risk information. However, due to the absence of labeling, they cannot be fully utilized in the credit risk assessment process, thus resulting in sample selection bias issues. Concurrently, most extant representative reject inference achievements are machine learning models with pronounced ‘black box' characteristics, making it challenging to satisfy the interpretability criteria of regulators. In this paper, we developed the LM-IMM model for labeling ‘rejected loan' samples with interpretable features and a multi-level sub-model transfer mechanism. The LM-IMM model first establishes interpretable heterogeneity sub-models in multiple levels to ascertain the pivotal risk characteristics of the labeled samples. Subsequently, it selects the most suitable sub-model for labeling the ‘rejected loan' samples according to their resemblance to the clusters of labeled samples. After demonstrating that the LM-IMM model has good interpretability, the empirical research indicates that the proposed model can achieve a high labeling accuracy. Furthermore, 10 representative credit risk assessment models constructed on datasets integrating newly labeled samples demonstrate significant performance improvements. Additionally, the LM-IMM model performs robustly when different numbers of clusters are set and on datasets with different sample balancing features.