🌙

一种面向高维数据分类的新型多目标遗传规划方法

A Novel Multiobjective Genetic Programming Approach to High-Dimensional Data Classification

IEEE Transactions on Cybernetics · 2024
被引 18
ABS 3

中文导读

提出一种问题特定的多目标遗传规划框架PS-MOGP,通过递归特征消除、渐进支配Pareto存档进化策略和平衡二元分解方法,解决高维数据分类中的解多样性、多类不平衡和大特征空间挑战,在基准和真实数据集上优于现有方法。

Abstract

The development of data sensing technology has generated a vast amount of high-dimensional data, posing great challenges for machine learning models. Over the past decades, despite demonstrating its effectiveness in data classification, genetic programming (GP) has still encountered three major challenges when dealing with high-dimensional data: 1) solution diversity; 2) multiclass imbalance; and 3) large feature space. In this article, we have developed a problem-specific multiobjective GP framework (PS-MOGP) for handling classification tasks with high-dimensional data. To reduce the large solution space caused by high dimensionality, we incorporate the recursive feature elimination strategy based on mining the archive of evolved GP solutions. A progressive domination Pareto archive evolution strategy (PD-PAES), which optimizes the objectives in a specific order according to their objectives, is proposed to evaluate the GP individuals and maintain a better diversity of solutions. Besides, to address the seriously imbalanced class issue caused by traditional binary decomposition (BD) one versus rest (OVR) for multiclass classification problems, we design a method named BD with a similar positive and negative class size (BD-SPNCS) to generate a set of auxiliary classifiers. Experimental results on benchmark and real-world datasets demonstrate that our proposed PS-MOGP outperforms state-of-the-art traditional and evolutionary classification methods in the context of high-dimensional data classification.

遗传规划高维数据分类多目标优化机器学习数据挖掘