🌙

RBKWOT:基于粗糙容差边界K模式聚类与加权过采样技术的类别不平衡处理方法

RBKWOT: Rough-Allowance Borderline K -Mode Clustering, Weighted Oversampling Technique for Class Imbalance Handling

IEEE Transactions on Systems, Man, and Cybernetics: Systems · 2026
被引 0
ABS 3

中文导读

针对类别不平衡问题,提出一种基于粗糙容差边界K模式聚类的加权过采样技术,能在不损失信息的情况下降低不平衡比,适用于分类、连续和混合数据。

Abstract

The performance of machine learning (ML) algorithms is impacted due to the class imbalance issue. Although there are several techniques developed to take care of this issue, research on handling categorical or mixed imbalanced data has not been explored much. Moreover, the existing class imbalance handling techniques may not preserve the original characteristics of the data. To address these issues, a new technique, namely, the rough-allowance borderline <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">K</i>-mode clustering-based weighted oversampling technique (RBKWOT), is developed. It consists of four steps: 1) extraction of rough-allowance borderline minority samples; 2) generation of optimum <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">K</i>-mode clusters; 3) assignment of sampling weights to these clusters; and 4) generation of rules for resampling instances corresponding to each cluster based on its sampling weight. The RBKWOT is capable of reducing the imbalance ratio without incurring any loss of information. It also handles the uncertainty in decision-making for the samples belonging to the boundary region. The efficacy of RBKWOT over some state-of-the-art techniques is demonstrated using 21 categorical, continuous, and mixed data that are acquired from the UCI Machine Learning Repository. Experimental results reveal the superiority of RBKWOT.

机器学习类别不平衡过采样聚类分析分类变量