FinBERT:一种从金融文本中提取信息的大型语言模型

FinBERT: A Large Language Model for Extracting Information from Financial Text

Contemporary Accounting Research · 2022
被引 518 · 同刊同年前 1%
人大 A-FT50ABS 4

中文导读

开发了FinBERT这一金融领域大型语言模型,在分析师报告和财报电话会议文本的情感分类中,显著优于传统词典和多种机器学习算法,尤其在小样本和金融专业词汇多的场景下表现突出。

Abstract

ABSTRACT We develop FinBERT, a state‐of‐the‐art large language model that adapts to the finance domain. We show that FinBERT incorporates finance knowledge and can better summarize contextual information in financial texts. Using a sample of researcher‐labeled sentences from analyst reports, we document that FinBERT substantially outperforms the Loughran and McDonald dictionary and other machine learning algorithms, including naïve Bayes, support vector machine, random forest, convolutional neural network, and long short‐term memory, in sentiment classification. Our results show that FinBERT excels in identifying the positive or negative sentiment of sentences that other algorithms mislabel as neutral, likely because it uses contextual information in financial text. We find that FinBERT's advantage over other algorithms, and Google's original bidirectional encoder representations from transformers model, is especially salient when the training sample size is small and in texts containing financial words not frequently used in general texts. FinBERT also outperforms other models in identifying discussions related to environment, social, and governance issues. Last, we show that other approaches underestimate the textual informativeness of earnings conference calls by at least 18% compared to FinBERT. Our results have implications for academic researchers, investment professionals, and financial market regulators.

FinBERT金融文本情感分类大语言模型上下文信息