🌙

用少量专家标注减轻仇恨言论检测中的偏见:一种基于提示的学习方法

Mitigating Bias in Hate Speech Detection With a Small Number of Expert Annotations: A Prompt-Based Learning Approach

MIS Quarterly · 2025
被引 1
人大 A+FT50UTD24ABS 4*

中文导读

针对通用标注者带来的系统性偏见,提出一种结合对比学习和提示学习的弱监督方法,仅用少量专家标注即可有效检测仇恨言论,在非裔美国人英语和LGBTQ+社区数据上验证了优越性能。

Abstract

Hate speech is a major problem on social media platforms. Automatic hate speech detection methods relying on machine learning models, which learn from manually labeled datasets, have been proposed in both academia and industry. However, there is increasing evidence that hate speech detection datasets labeled by general annotators (e.g., amateurs or MTurk workers) contain systematic bias, as they cannot effectively consider language use differences among different speakers. When such biased datasets are used to train machine learning models, the resulting models will also be biased. Unlike general annotators, experts can produce much less biased annotations. However, expert annotations cannot be efficiently obtained in large quantities. This paper bridges the gap by adopting a weakly supervised learning method for hate speech detection using a small number of expert annotations. We propose a novel design that uses contrastive learning and prompt-based learning based on large language models, incorporating a group estimator, a pair generator, and knowledge injection. Using real-world Twitter posts written by African American English speakers and other racial groups as an example, extensive experiments were conducted to demonstrate the superior performance of the proposed method. The proposed approach was also evaluated on data in the LGBTQ+ community and achieved consistent results. The study has important academic and practical implications for hate speech detection and large language models.

仇恨言论检测偏见缓解弱监督学习提示学习社交媒体分析