社交媒体中代码转换攻击性语言的文体特征

Stylometric characteristics of code-switched offensive language in social media

INFORMATION & MANAGEMENT · 2025
被引 0
人大 A-ABS 3

中文导读

识别并验证了社交媒体中代码转换攻击性语言具有独特的文体特征,并构建了首个相关数据集,发现这些特征能显著提升攻击性语言检测模型的性能。

Abstract

Offensive language is a significant detriment to social media environments. Existing research predominantly assumes monolingual expression, overlooking the prevalent behavior of code-switching (CS). To address this critical knowledge gap, this study identifies and empirically validates the distinct stylometric characteristics of code-switched (CSed) offensive language. Additionally, we developed methods to construct the first social media dataset specifically for CSed offensive content. Our analysis of this dataset reveals that CSed offensive language exhibits unique stylometric characteristics; moreover, these characteristics vary between the language segments involved in the CS. Furthermore, incorporating these features significantly enhances the performance of offensive language detection models. These findings offer significant research and practical implications for social media researchers, platforms, moderators, and users.

自然语言处理社交媒体分析攻击性语言检测代码转换