🌙

当Tukey遇上Chauvenet:一种新的箱线图离群值检测准则

When Tukey Meets Chauvenet: A New Boxplot Criterion for Outlier Detection

Journal of Computational and Graphical Statistics · 2025
被引 8 · 同刊同年前 1%
ABS 3

中文导读

结合Tukey箱线图与Chauvenet准则,提出一种样本量自适应的新箱线图,通过精确控制每观测外点率确定栅栏系数,兼具简单性和稳健性,模拟与香港公务员薪酬调整实例表现优异。

Abstract

The box-and-whisker plot, introduced by Tukey (1977), is one of the most popular graphical methods in descriptive statistics. On the other hand, however, Tukey’s boxplot is free of sample size, yielding the so-called “one-size-fits-all” fences for outlier detection. Although improvements on the sample size adjusted boxplots do exist in the literature, most of them are either not easy to implement or lack justification. As another common rule for outlier detection, Chauvenet’s criterion uses the sample mean and standard derivation to perform the test, but it is often sensitive to the included outliers and hence is not robust. In this paper, by combining Tukey’s boxplot and Chauvenet’s criterion, we introduce a new boxplot, namely the Chauvenet-type boxplot, with the fence coefficient determined by an exact control of the outside rate per observation. Our new outlier criterion not only maintains the simplicity of the boxplot from a practical perspective, but also serves as a robust Chauvenet’s criterion. Simulation study and a real data analysis on the civil service pay adjustment in Hong Kong demonstrate that the Chauvenet-type boxplot performs extremely well regardless of the sample size, and can therefore be highly recommended for practical use to replace both Tukey’s boxplot and Chauvenet’s criterion. Lastly, to increase the visibility of the work, a user-friendly R package named ‘ChauBoxplot’ has also been officially released on CRAN.

统计学异常检测计量经济学数据科学