A New Oversampling Method Based on Triangulation of Sample Space
提出一种基于有限元思想的改进SMOTE方法FE-SMOTE,通过三角剖分将合成样本从线扩展到空间,生成更符合原始少数类密度分布的样本,在22个基准数据集和MNIST上优于16种常见过采样方法。
Coping with imbalanced data is a challenging task in practical classification problems. One of effective methods to solve imbalanced problems is to oversample the minority class. Gls SMOTE is a classical oversampling method. However, it exhibits two disadvantages, namely, a linear generation and overgeneralization. In this article, an improved synthetic minority oversampling technique (SMOTE) method, FE- SMOTE, is proposed based on the idea of the method of finite elements. FE- SMOTE not only overcomes the above two disadvantages of SMOTE but also can generate samples that are more in line with the density distribution of the original minority class than those generated by the existing SMOTE variants. The originality of the proposed method stems from constructing a simplex for every minority sample and then triangulating it to expand the region of synthetic samples from lines to space. A new definition of the relative size for triangular elements not only helps determine the number of synthetic samples but also weakens the adverse impact of outliers. Generated samples by FE- SMOTE can effectively reflect the local potential distribution structure arising around every minority sample. Compared with 16 commonly studied oversampling methods, FE- SMOTE produces promising results quantified in terms of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$G$ </tex-math></inline-formula> -mean, AUC, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$F$ </tex-math></inline-formula> -measure, and accuracy on 22 benchmark imbalanced datasets and the big dataset MNIST.