What Doesn’t Get Measured Does Exist: Improving the Accuracy of Computer-Aided Text Analysis
指出计算机辅助文本分析存在三种测量误差(瞬时误差、特定因子误差和算法误差),并演示如何计算和减少这些误差,以提升研究结论的准确性。
Computer-aided text analysis (CATA) is a form of content analysis that enables the measurement of constructs by processing text into quantitative data based on the frequency of words. CATA has been proposed as a useful measurement approach with the potential to lead to important theoretical advancements. Ironically, while CATA has been offered to overcome some of the known deficiencies in existing measurement approaches, we have lagged behind in regard to assessing the technique’s measurement rigor. Our article addresses this knowledge gap and describes important implications for past as well as future research using CATA. First, we describe three sources of measurement error variance that are particularly relevant to studies using CATA: transient error, specific factor error, and algorithm error. Second, we describe and demonstrate how to calculate measurement error variance with the entrepreneurial orientation, market orientation, and organizational ambidexterity constructs, offering evidence that past substantive conclusions have been underestimated. Third, we offer best-practice recommendations and demonstrate how to reduce measurement error variance by refining existing CATA measures. In short, we demonstrate that although measurement error variance in CATA has not been measured thus far, it does exist and it affects substantive conclusions. Consequently, our article has implications for theory and practice, as well as how to assess and minimize measurement error in future CATA research with the goal of improving the accuracy of substantive conclusions.