🌙

基于本体的信息抽取:使用远程监督标记激进在线内容

Ontology-Based Information Extraction for Labeling Radical Online Content Using Distant Supervision

Information Systems Research · 2023
被引 8
人大 AFT50UTD24ABS 4*

中文导读

提出一种通过本体和深度学习自动追踪激进意识形态的方法,用少量示例即可生成高精度检测模型,降低内容标记模型适应新意识形态的成本。

Abstract

Social media companies dedicate significant resources to create machine-learning models to label harmful content on their platforms, including content promoting violent, extremist beliefs. These models have to evolve over time to keep up with a dynamic threat landscape. Over time, as new violent ideologies emerge, existing models will fail to detect them. Training fresh models for the task is risky (there are new model biases to understand), time consuming (you will need to see many examples to predict new examples), and cost-ineffective. We propose an approach that prioritizes the evolution and representation of radical ideas by creating a computer program to explicitly keep track of ideologies. We show how this program uses state-of-the art deep-learning models to create human and machine-readable representations of radical ideologies by automatically consuming content symbolic of those ideologies. Our approach validates the notion that violent ideologies differ in content but are homogenous in structure. With just a few examples of content, the program creates powerful representations that can be used to automatically detect additional content with surprising accuracy. This process greatly reduces the time and resources necessary to adapt existing content-labeling models to the changing ideological and rhetorical landscape.

社交媒体机器学习内容审核意识形态自然语言处理