Uncertainty Oriented-Incremental Erasable Pattern Mining Over Data Streams
提出一种基于列表结构的算法,从不确定数据流中实时挖掘可擦除模式,帮助制造工厂识别低利润产品线以提升整体利润,性能比现有算法快七倍。
In a manufacturing factory, product lines are organized by several constituents and exhibit a profit value, i.e., income from products. Erasable patterns are less profitable patterns whose gain, i.e., the sum of product profits, does not exceed a user-defined threshold. Mining erasable patterns provides the necessary information to users who want to increase profits by erasing less profitable patterns. There are requirements for a method which efficiently manages uncertain databases in incremental environments to identify erasable patterns that consider uncertainty. Because our novel technique uses a list structure, it is more efficient at finding erasable patterns from incremental databases. Moreover, accumulated stream data should be handled efficiently to identify new useful patterns in both additional data and the existing data. In this article, an algorithm using a list-based structure is proposed to extract erasable patterns containing valuable knowledge from uncertain databases in real time with effective and productive performance. In order to derive erasable patterns from continuously accumulated stream databases, the structure efficiently manages the information gathered from the previous database. Extensive performance and pattern quality evaluations were conducted using real and synthetic datasets. The results show that the algorithm performs up to seven times faster than state-of-the-art erasable pattern mining algorithms on real datasets and scales adeptly on synthetic datasets while delivering reliable and significant result patterns.