Using Machine Learning to Detect Financial Distress From Sustainability Reports
研究了可持续发展报告文本对预测企业财务困境的增量价值,使用NLP提取特征并对比多种机器学习模型,发现加入文本信息能显著提升预测效果,且不同行业的ESG问题重要性不同。
ABSTRACT This study examines the incremental predictive value of sustainability reports in forecasting corporate financial distress. We first construct a unique sample of 1220 sustainability reports produced by 244 firms from S&P 500 index between 2018 and 2022. We then employ natural language processing (NLP) techniques to extract key features from the textual content of corporate sustainability reports, introducing them as a novel input to financial distress prediction models. A suite of machine learning algorithms is then applied to assess predictive performance. Our results show that incorporating textual sustainability disclosures significantly improves model performance relative to using only quantitative variables. These textual reports outline the corporate strategies on sustainability, providing additional insights that enhance the prediction of financial distress. Among the tested models, Random Forest and XGBoost regressors exhibit superior performance. We also find that the materiality of specific ESG issues in predicting financial distress varies across sectors. Overall, this study offers a framework for integrating sustainability reports and ensemble learning into corporate credit risk assessment.