🌙

高耐破线性判别分析

High-Breakdown Linear Discriminant Analysis

Journal of the American Statistical Association · 1997
被引 32
ABS 4

中文导读

针对线性判别分析中样本均值和协方差矩阵易受异常值污染的问题,提出高耐破估计方法,生成不受少数严重异常值影响的稳健分类规则,用于辅助或替代传统分析。

Abstract

The classification rules of linear discriminant analysis are defined by the true mean vectors and the common covariance matrix of the populations from which the data come. Because these true parameters are generally unknown, they are commonly estimated by the sample mean vector and covariance matrix of the data in a training sample randomly drawn from each population. However, these sample statistics are notoriously susceptible to contamination by outliers, a problem compounded by the fact that the outliers may be invisible to conventional diagnostics. High-breakdown estimation is a procedure designed to remove this cause for concern by producing estimates that are immune to serious distortion by a minority of outliers, regardless of their severity. In this article we motivate and develop a high-breakdown criterion for linear discriminant analysis and give an algorithm for its implementation. The procedure is intended to supplement rather than replace the usual sample-moment methodology of discriminant analysis either by providing indications that the dataset is not seriously affected by outliers (supporting the usual analysis) or by identifying apparently aberrant points and giving resistant estimators that are not affected by them.

线性判别分析高耐破估计异常值处理分类规则