Relative Fuzzy Rough Approximations for Feature Selection and Classification
针对类别密度差异大的数据分布,提出相对距离度量并构建相对模糊粗糙集模型,定义相对模糊依赖度来评估特征重要性,进而设计特征选择算法和基于最大正区域的分类器,实验表明该方法有效且性能优于经典算法。
Fuzzy rough set (FRS) theory is generally used to measure the uncertainty of data. However, this theory cannot work well when the class density of a data distribution differs greatly. In this work, a relative distance measure is first proposed to fit the mentioned data distribution. Based on the measure, a relative FRS model is introduced to remedy the mentioned imperfection of classical FRSs. Then, the positive region, negative region, and boundary region are defined to measure the uncertainty of data with the relative FRSs. Besides, a relative fuzzy dependency is defined to evaluate the importance of features to decision. With the proposed feature evaluation, we propose a feature selection algorithm and design a classifier based on the maximal positive region. The classification principle is that an unlabeled sample will be classified into the class corresponding to the maximal degree of the positive region. Experimental results show the relative fuzzy dependency is an effective and efficient measure for evaluating features, and the proposed feature selection algorithm presents better performance than some classical algorithms. Besides, it also shows the proposed classifier can achieve slightly better performance than the KNN classifier, which demonstrates that the maximal positive region-based classifier is effective and feasible.