使用邻域粗糙集和遗传编程检测不平衡高维数据中的重叠区域

Detecting Overlapping Areas in Unbalanced High-Dimensional Data Using Neighborhood Rough Set and Genetic Programming

IEEE Transactions on Evolutionary Computation · 2022

被引 13

ABS 4

Wenbin Pei
Lin Shang
Bing Xue
Mengjie Zhang

中文导读

提出一种新的代价敏感遗传编程方法，利用粗糙集理论在训练前检测重叠区域，以改善不平衡高维数据的分类性能，实验表明该方法优于46种流行方法。

Abstract

Unbalanced classification has attracted widespread interest because of its broad applications. However, due to mainly the uneven class distribution, constructed classifiers are usually biased toward the majority class, and thereby perform terribly on the minority class. Unfortunately, the minority class is often the class of interest in many real-world applications. High dimensionality often further degrades the classification performance, making it more complicated to address the class imbalance issue. Genetic programming (GP) has been applied to construct classifiers, which can simultaneously select good-quality features to improve the classification performance. To handle the class imbalance issue, cost-sensitive GP classifiers treat the minority class as being more important than the majority class, but this may cause an accuracy decrease in overlapping areas where the prior probabilities of the two classes are almost the same. To date, most cost-sensitive classification methods have not been specifically investigated how the impacts of overlapping areas on cost-sensitive classifiers can be avoided. In this study, we propose a new cost-sensitive GP method, where rough set theory is employed to detect overlapping areas before training cost-sensitive classifiers for classification with unbalanced high-dimensional data. The proposed method is compared with 46 popular classification methods, including 10 GP methods and 36 non-GP methods on 14 datasets that are unbalanced and high dimensional. The experimental results indicate that the proposed method performs better than the compared methods in almost all cases.

机器学习数据挖掘不平衡分类遗传编程粗糙集

阅读原文 ↗