面向演化短文本流聚类的在线语义增强图模型

An Online Semantic-Enhanced Graphical Model for Evolving Short Text Stream Clustering

IEEE Transactions on Cybernetics · 2021
被引 12
ABS 3

中文导读

提出一个在线语义增强图模型,利用词共现语义信息动态维护演化中的活跃主题,解决短文本流中的术语歧义问题,无需外部资源即可高效处理大规模数据流。

Abstract

Due to the popularity of social media and online fora, such as Twitter, Reddit, Facebook, and Wechat, short text stream clustering has gained significant attention in recent years. However, most existing short text stream clustering approaches usually work on static data and tend to cause a "term ambiguity" problem due to the sparse word representation. Beyond, they often exploit short text streams in a batch way and are difficult to find evolving topics in term-changing subspaces. In this article, we propose an online semantic-enhanced graphical model for evolving short text stream clustering (OSGM), by exploiting the word-occurrence semantic information and dynamically maintaining evolving active topics in term-changing subspaces in an online way. Compared to the existing approaches, our online model is not only free of determining the optimal batch size but also lends itself to handling large-scale data streams efficiently. It is also able to handle the "term ambiguity" problem without incorporating features from external resources. More importantly, to the best of our knowledge, it is the first work to extract evolving topics in term-changing subspaces automatically in an online way. Extensive experiments demonstrate that the proposed model yields better performance compared to many state-of-the-art algorithms on both synthetic and real-world datasets.

短文本聚类数据流挖掘在线学习语义增强演化主题检测