处理大数据时需考虑的实际问题

Practical issues to consider when working with big data

Review of Accounting Studies · 2022
被引 5
人大 A-FT50ABS 4

中文导读

指出研究者在使用大数据前应权衡其成本与收益,并强调四个实际问题:大数据与传统数据无本质区别、样本有限、处理成本高且效度低、以及可能因追求数据新颖性而忽视研究问题。

Abstract

Abstract Increasing access to alternative or “big data” sources has given rise to an explosion in the use of these data in economics-based research. However, in our enthusiasm to use the newest and greatest data, we as researchers may jump to use big data sources before thoroughly considering the costs and benefits of a particular dataset. This article highlights four practical issues that researchers should consider before working with a given source of big data. First, big data may not be conceptually different from traditional data. Second, big data may only be available for a limited sample of individuals, especially when aggregated to the unit of interest. Third, the sheer volume of data coupled with high levels of noise can make big data costly to process while still producing measures with low construct validity. Last, papers using big data may focus on the novelty of the data at the expense of the research question. I urge researchers, in particular PhD students, to carefully consider these issues before investing time and resources into acquiring and using big data.

大数据数据成本数据质量研究设计