🌙

去噪ESG:通过缺失数据的概率插补实现不确定性感知评分

Denoising ESG: uncertainty-aware scoring through probabilistic imputation of missing data

Quantitative Finance · 2025
被引 2
人大 BABS 3

中文导读

针对ESG数据缺失导致评分不一致的问题,提出基于随机森林的多重插补方法(RF-MICE),在插补时保留不确定性,并通过银行案例验证其能提升评分客观性。

Abstract

Environmental, Social and Governance (ESG) datasets are frequently plagued by significant data gaps, leading to inconsistencies in ESG ratings due to varying imputation methods. This study addresses the issues of missing data in the ESG datasets to highlight the importance of capturing the uncertainty that comes from the presence of missing data and propagating it when calculating ESG scores. Multiple Imputation with Random Forest-Based MICE (RF-MICE) algorithm coupled with Predictive Mean Matching was identified as a valuable choice to impute missing data points while keeping track of their uncertainty. The RF-MICE was tested on Intesa Sanpaolo Bank ESG data as case study, using an ad-hoc testing method where a Gradient Boosting (GB) classifier model was developed to simulate the missing data mechanism observed in the test subsample, thus reproducing realistic test conditions. Besides improving imputation accuracy, the RF-MICE provides an understanding of the impact of missing data uncertainty on ESG scores, accounting for both the abundance of missing features and the quality of available data. The imputation model output sheds light on cases where the presence of missing data could prevent analysts from objectively differentiating companies' performance in terms of ESG Scoring. Also the use of the GB model was tested to predict the likelihood of missing data points, providing a proof of concepts of how this model could be used to raise warnings on anomalous missing data which could be due to lack of disclosure.

环境社会治理缺失数据处理机器学习金融数据评分方法