含缺失值的大规模数据可视化

LARGE‐SCALE DATA VISUALIZATION WITH MISSING VALUES

Technological and Economic Development of Economy · 2006
被引 1
人大 A-

中文导读

提出一种改进的自联想神经网络学习过程,直接处理缺失值,用于大规模数据的降维可视化,避免传统方法在缺失值较多时的偏差。

Abstract

Visualization of large‐scale data inherently requires dimensionality reduction to 1D, 2D, or 3D space. Autoassociative neural networks with a bottleneck layer are commonly used as a nonlinear dimensionality reduction technique. However, many real‐world problems suffer from incomplete data sets, i.e. some values can be missing. Common methods dealing with missing data include the deletion of all cases with missing values from the data set or replacement with mean or “normal” values for specific variables. Such methods are appropriate when just a few values are missing. But in the case when a substantial portion of data is missing, these methods can significantly bias the results of modeling. To overcome this difficulty, we propose a modified learning procedure for the autoassociative neural network that directly takes the missing values into account. The outputs of the trained network may be used for substitution of the missing values in the original data set.

大规模数据可视化缺失值处理自联想神经网络非线性降维