A Novel Multiobjective Genetic Programming Approach to High-Dimensional Data Classification
提出一种问题特定的多目标遗传规划框架PS-MOGP,通过递归特征消除、渐进支配Pareto存档进化策略和平衡二元分解方法,解决高维数据分类中的解多样性、多类不平衡和大特征空间挑战,在基准和真实数据集上优于现有方法。
The development of data sensing technology has generated a vast amount of high-dimensional data, posing great challenges for machine learning models. Over the past decades, despite demonstrating its effectiveness in data classification, genetic programming (GP) has still encountered three major challenges when dealing with high-dimensional data: 1) solution diversity; 2) multiclass imbalance; and 3) large feature space. In this article, we have developed a problem-specific multiobjective GP framework (PS-MOGP) for handling classification tasks with high-dimensional data. To reduce the large solution space caused by high dimensionality, we incorporate the recursive feature elimination strategy based on mining the archive of evolved GP solutions. A progressive domination Pareto archive evolution strategy (PD-PAES), which optimizes the objectives in a specific order according to their objectives, is proposed to evaluate the GP individuals and maintain a better diversity of solutions. Besides, to address the seriously imbalanced class issue caused by traditional binary decomposition (BD) one versus rest (OVR) for multiclass classification problems, we design a method named BD with a similar positive and negative class size (BD-SPNCS) to generate a set of auxiliary classifiers. Experimental results on benchmark and real-world datasets demonstrate that our proposed PS-MOGP outperforms state-of-the-art traditional and evolutionary classification methods in the context of high-dimensional data classification.