Differential Evolution With Duplication Analysis for Feature Selection in Classification
提出一种基于小生境差分进化的特征选择方法,通过重复性分析和子集修复,找到多个分类精度相近的最优特征子集,在18个数据集上优于对比算法。
By selecting a small subset of relevant features, feature selection can reduce the dimensionality of the problem while maintaining or increasing the discriminating ability of the data. However, many existing feature selection approaches ignore the fact that there are multiple optimal solutions to a feature selection problem. Multiple feature subsets with different features selected can achieve very similar or the same classification accuracy. To search for multiple optimal feature subsets, a niching-based differential evolution (DE) method with duplication analysis is proposed. In the proposed method, the duplicated feature subsets in the population are modified by the proposed subset repairing scheme which can produce unique feature subsets. Second, the mutation operator in DE is improved, which uses both the niche and global information to produce promising feature subsets. Third, a new selection method considering the diversity among feature subsets is adopted to form a new population for the next-generation. In the experiments, the proposed method is compared with seven evolutionary feature selection algorithms and two typical feature selection methods on 18 datasets. The results show that the proposed algorithm achieves higher classification accuracy than the compared methods on most of the used datasets. Furthermore, the proposed method can find different feature subsets with very similar or the same classification accuracy.