🌙

面向不平衡数据分类的进化超图辅助混合采样方法

Evolutionary Hypergraph-assisted Hybrid Sampling for Imbalanced Data Classification

IEEE Transactions on Evolutionary Computation · 2026
被引 0
ABS 4

中文导读

提出一种结合超图高阶关系建模与遗传算法的混合采样方法,通过超图捕捉样本间复杂交互,指导遗传算法选择高质量实例,在18个KEEL数据集上显著提升四种分类器在不平衡分类中的性能。

Abstract

In classification with imbalanced data, sampling methods have proven effective in rebalancing data distributions. However, most existing methods fail to adequately consider higher-order relationships (i.e., complex interactions) among instances. This often results in the exclusion of potentially informative instances during the undersampling process and thereby undermines the classifier’s ability to accurately distinguish between classes. To overcome this drawback, we propose a novel evolutionary hybrid sampling method that explicitly leverages hypergraph-based higher-order relationship modeling with a genetic algorithm to select good-quality instances. In the proposed method, after oversampling the imbalanced data, hypergraphs are employed to model the complex interactions among the oversampled data. The obtained interaction information is then utilized to guide genetic algorithms to select informative and representative instances that reflect the core characteristics of the data and are essential for a classifier to distinguish different classes. Experiments have been conducted on 18 imbalanced datasets<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> from the KEEL repository, which covers diverse application domains, e.g., biomedical classification, material analysis, and food quality assessment. Experimental results show that the proposed method outperforms baseline sampling methods in helping four types of classifiers (Random Forest, Decision Trees, Gradient Boosting Decision Tree, and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i>-Nearest Neighbors) to improve their performance in imbalanced classification. Statistical significance tests (including Friedman, Bonferroni-Dunn, and Wilcoxon tests with the significance level of 0.05) have also been conducted, demonstrating statistically significant improvements. The proposed method is applicable to real-world scenarios involving class imbalance issues, such as medical diagnosis and industrial defect detection.

不平衡分类混合采样超图建模进化算法数据重采样