RBKWOT: Rough-Allowance Borderline K -Mode Clustering, Weighted Oversampling Technique for Class Imbalance Handling
针对类别不平衡问题,提出一种基于粗糙容差边界K模式聚类的加权过采样技术,能在不损失信息的情况下降低不平衡比,适用于分类、连续和混合数据。
The performance of machine learning (ML) algorithms is impacted due to the class imbalance issue. Although there are several techniques developed to take care of this issue, research on handling categorical or mixed imbalanced data has not been explored much. Moreover, the existing class imbalance handling techniques may not preserve the original characteristics of the data. To address these issues, a new technique, namely, the rough-allowance borderline <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">K</i>-mode clustering-based weighted oversampling technique (RBKWOT), is developed. It consists of four steps: 1) extraction of rough-allowance borderline minority samples; 2) generation of optimum <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">K</i>-mode clusters; 3) assignment of sampling weights to these clusters; and 4) generation of rules for resampling instances corresponding to each cluster based on its sampling weight. The RBKWOT is capable of reducing the imbalance ratio without incurring any loss of information. It also handles the uncertainty in decision-making for the samples belonging to the boundary region. The efficacy of RBKWOT over some state-of-the-art techniques is demonstrated using 21 categorical, continuous, and mixed data that are acquired from the UCI Machine Learning Repository. Experimental results reveal the superiority of RBKWOT.