A Highly-Accurate Three-Way Decision-Incorporated Online Sparse Streaming Features Selection Model
针对在线流特征选择中数据稀疏导致的不确定性问题,提出融合三支决策和潜在因子分析的模型,在12个真实数据集上优于7种现有方法。
An online streaming feature selection (OSFS) model is highly efficient in processing the high-dimensional streaming features. In practical big data-related applications, streaming features are mostly highly-incomplete due to various unpredictable reasons like the privacy protection, leading to the issue of online sparse streaming feature selection (OS2FS). The incomplete streaming features can lead to the uncertain relationship between the labels and sparse features during the feature selection process, yet existing OSFS and OS2FS models focus on the certain relationships, resulting in accuracy loss by improperly-selected features. To address this critical issue, this article presents a three <xref ref-type="disp-formula" rid="deqn3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">(3)</xref>-way decision-incorporated OS2FS (3WDO) model with the following two-fold ideas: 1) utilizing the latent factor analysis (LFA) approach to pre-estimate the missing data of the concerned sparse streaming features and 2) integrating the three-way decision (3WD) into the streaming features selection process for appropriately modeling the uncertainty within the label-feature interactions. By doing so, the uncertain relationships between labels and sparse features are characterized by more information and looser tolerance, thereby minimizing the decision risk of feature selection. Experimental results on twelve real-world datasets demonstrate that the proposed 3WDO model significantly outperforms seven state-of-the-art OSFS and OS2FS models, which strongly supports its ability of addressing practical issues.