多元数据中异常值的识别

Identification of Outliers in Multivariate Data

Journal of the American Statistical Association · 1996

被引 85

ABS 4

David M. Rocke
David L. Woodruff

中文导读

本文解释了多元异常值检测的困难及其随数据维度增加而加剧的原因，提出一种混合方法并通过模拟实验评估其在不同条件下的检测能力，附有可用的软件实现。

Abstract

Abstract New insights are given into why the problem of detecting multivariate outliers can be difficult and why the difficulty increases with the dimension of the data. Significant improvements in methods for detecting outliers are described, and extensive simulation experiments demonstrate that a hybrid method extends the practical boundaries of outlier detection capabilities. Based on simulation results and examples from the literature, the question of what levels of contamination can be detected by this algorithm as a function of dimension, computation time, sample size, contamination fraction, and distance of the contamination from the main body of data is investigated. Software to implement the methods is available from the authors and STATLIB. Key Words: Heuristic searchM estimationMinimum covariance determinantS estimation

异常检测多元统计数据挖掘机器学习

阅读原文 ↗