SEntFiN 1.0: Entity‐aware sentiment analysis for financial news
该研究发布了包含10,753条新闻标题的人工标注情感数据集SEntFiN 1.0,其中2,847条含多个实体且情感冲突,并提出了基于特征而非表达的情感提取框架,实验显示RoBERTa和finBERT准确率达94.29%。
Abstract Fine‐grained financial sentiment analysis on news headlines is a challenging task requiring human‐annotated datasets to achieve high performance. Limited studies have tried to address the sentiment extraction task in a setting where multiple entities are present in a news headline. In an effort to further research in this area, we make publicly available SEntFiN 1.0, a human‐annotated dataset of 10,753 news headlines with entity‐sentiment annotations, of which 2,847 headlines contain multiple entities, often with conflicting sentiments. We augment our dataset with a database of over 1,000 financial entities and their various representations in news media amounting to over 5,000 phrases. We propose a framework that enables the extraction of entity‐relevant sentiments using a feature‐based approach rather than an expression‐based approach. For sentiment extraction, we utilize 12 different learning schemes utilizing lexicon‐based and pretrained sentence representations and five classification approaches. Our experiments indicate that lexicon‐based N‐gram ensembles are above par with pretrained word embedding schemes such as GloVe. Overall, RoBERTa and finBERT (domain‐specific BERT) achieve the highest average accuracy of 94.29% and F1‐score of 93.27%. Further, using over 210,000 entity‐sentiment predictions, we validate the economic effect of sentiments on aggregate market movements over a long duration.