A Rank Correlation Coefficient Resistant to Outliers
本文定义了一种基于最大偏差原理的非参数相关系数Rg,易于手工计算,在存在偏向性异常值时比Pearson、Spearman和Kendall相关系数更稳健,并通过实际数据分析展示了其独特的相关性度量方式。
In this article, a nonparametric correlation coefficient is defined that is based on the principle of maximum deviations. This new correlation coefficient, Rg , is easy to compute by hand for small to medium sample sizes. In comparing it with existing correlation coefficients, it was found to be superior in a sampling situation that we call "biased outliers," and hence appears to be more resistant to outliers than the Pearson, Spearman, and Kendall correlation coefficients. In a correlational study not included in this article of some social data consisting of five variables for each of 51 observations, Rg was compared with the other three correlation coefficients. There was agreement on 8 of the 10 possible correlations, but in one case, Rg was significant when the others were not, and in yet another case, Rg was not significant when the others were. A further analysis of this data set indicated that there were three to six data points that were anomalies and had a severe effect on the other correlations but not Rg . Apparently, the statistic Rg measures association in a unique fashion. This different measure of association for real data is extended to a population interpretation and expressed in terms of the copula function.