Is machine learning really effective in detecting corporate fraud?
研究评估了机器学习在中国上市公司会计欺诈检测中的效果,发现虽然某些模型在AUC和F1分数上表现更好,但精确率低于0.10,实际应用价值有限。
Purpose This study aims to evaluate the effectiveness of machine learning (ML) in detecting accounting fraud among Chinese-listed firms from 2007 to 2022. It aims to determine whether advanced ML techniques outperform traditional logistic regression models in accuracy, precision and practical applicability for fraud detection. Design/methodology/approach The research analyzes a dataset of 48,746 firm-year observations, including 6,790 instances of fraud. Employing nine ML models (e.g. Random Forest, RUSBoost and LightGBM) and traditional logistic regression, the study uses SAS Visualization and Python for variable selection and model construction. It evaluates model performance with metrics such as AUC, precision, recall, F1 score and net benefit under different data processing scenarios. Findings Results are mixed. While Random Forest, LightGBM and RUSBoost models exhibit superior AUC and F1 scores, none achieve a precision rate above 0.10, indicating high false-positive rates. The low precision rate significantly limits their practical value for regulators, investors and professionals such as analysts and auditors. Logistic regression and support vector machine models often achieve higher recall rates, suggesting traditional approaches remain competitive in identifying fraudulent firms. Originality/value This study highlights limitations in the practical utility of ML for corporate fraud detection due to low precision rates and false positives. It contributes a nuanced understanding of ML’s role in accounting research, emphasizing the need for integrating qualitative data and improving model precision for real-world application. Furthermore, it offers new insights into using SAS Visualization and random data-splitting methods in fraud prediction.