关于定义、识别和处理异常值的最佳实践建议

Best-Practice Recommendations for Defining, Identifying, and Handling Outliers

ORGANIZATIONAL RESEARCH METHODS · 2013
被引 1277 · 同刊同年前 5%
人大 A-ABS 4

中文导读

综述了46种方法论文献和232篇组织科学论文,发现异常值的定义、识别和处理方法存在不一致和缺乏透明度,并提供了决策树等指南以改进实践。

Abstract

The presence of outliers, which are data points that deviate markedly from others, is one of the most enduring and pervasive methodological challenges in organizational science research. We provide evidence that different ways of defining, identifying, and handling outliers alter substantive research conclusions. Then, we report results of a literature review of 46 methodological sources (i.e., journal articles, book chapters, and books) addressing the topic of outliers, as well as 232 organizational science journal articles mentioning issues about outliers. Our literature review uncovered (a) 14 unique and mutually exclusive outlier definitions, 39 outlier identification techniques, and 20 different ways of handling outliers; (b) inconsistencies in how outliers are defined, identified, and handled in various methodological sources; and (c) confusion and lack of transparency in how outliers are addressed by substantive researchers. We offer guidelines, including decision-making trees, that researchers can follow to define, identify, and handle error, interesting, and influential (i.e., model fit and prediction) outliers. Although our emphasis is on regression, structural equation modeling, and multilevel modeling, our general framework forms the basis for a research agenda regarding outliers in the context of other data-analytic approaches. Our recommendations can be used by authors as well as journal editors and reviewers to improve the consistency and transparency of practices regarding the treatment of outliers in organizational science research.

组织科学研究方法数据分析异常值处理