方法重要:实施选择如何塑造金融文本中的主题发现

When methods matter: how implementation choices shape topic discovery in financial text

Accounting and Business Research · 2026
被引 0
人大 BABS 3

中文导读

研究了LDA主题模型在FTSE350公司年报风险披露中的应用,发现预处理、多词表达和标签策略等实施选择显著影响主题表示和推断,并提出了一个实施检查清单。

Abstract

This paper examines the application of LDA topic modelling to risk disclosures in FTSE350 firms’ annual reports. We show that LDA implementation choices significantly impact topic representations and subsequent inferences. Using a corpus of FTSE350 annual reports, we show that preprocessing decisions, multiword expressions and labelling strategies materially affect topic interpretability and granularity. Our analysis reveals that while risk reporting addresses key business risks at an aggregate level, the degree of firm-specific commentary is sensitive to topic granularity. Hierarchical linear modelling suggests that 27% of topic variation is within firms for broad topics, increasing to 75% for granular topics. We leverage GPT to enhance topic labelling, showcasing the potential of LLMs in financial text analysis. We also compare LDA to modern embedding-based topic models, finding that while they often generate more coherent topics, they introduce a new set of critical implementation choices and do not eliminate the need for researcher discretion. These findings challenge the claims of LDA objectivity and highlight the importance of domain expertise. We propose a practical checklist for LDA implementation in accounting and finance research emphasising transparency and robustness checks.

金融文本分析主题模型风险披露会计研究