利用大语言模型进行仇恨言论检测：基于多智能体和信息论提示学习增强上下文理解

Leveraging Large Language Models for Hate Speech Detection: Multi-Agent, Information-Theoretic Prompt Learning for Enhancing Contextual Understanding

Journal of Management Information Systems · 2025

被引 2

人大 AFT50ABS 4

Kyuhan Lee · 亚利桑那大学通讯
Sudha Ram · 亚利桑那大学

中文导读

提出一个多智能体提示学习框架，利用信息论选择有效提示，并结合动机感知指令调优，显著提升仇恨言论检测性能，尤其关注其社会和心理复杂性。

Abstract

An essential component in combating hate speech is the development of effective computational algorithms. While prior research has proposed a range of methods for hate speech detection, they often fall short in addressing the complex nature of hate speech, which is characterized by its nuanced nature, the diversity of its forms, and the heterogeneous motivations behind it. To address these limitations, we introduce a novel prompt-learning framework for hate speech detection. Our approach offers several key innovations: (i) prompt generation is delegated to multiple language model agents, drawing upon the theory of questioning as a guiding principle; (ii) we employ an information-theoretic selection mechanism to identify the most effective prompts from a pool of candidates; and (iii) we incorporate motivation-aware instruction tuning to improve the model’s capacity to capture the diverse motivational drivers of hate speech. Our empirical evaluation, which includes comparisons with state-of-the-art benchmarks and multiple robustness checks, demonstrates significant performance gains achieved by our framework. These findings highlight the promise of prompt-learning based methods in hate speech detection, particularly when designed with attention to the social and psychological complexities that characterize online hate speech.

仇恨言论检测大语言模型提示学习自然语言处理

阅读原文 ↗