组织研究中的文本相似性:应用综述、方法一致性与最佳实践建议

Textual Similarity in Organizational Research: Review of Applications, Consistency of Methods, and Best Practice Recommendations

ORGANIZATIONAL RESEARCH METHODS · 2026
被引 0
人大 A-ABS 4

中文导读

综述了组织与心理学研究中文本相似性的应用,发现不同NLP方法间的相似性度量存在标签混淆,并检验了方法内与方法间的一致性,为研究者提供了选择合适方法的最佳实践建议。

Abstract

Organizational research increasingly uses natural language processing (NLP) to measure textual similarity. Despite common usage, the meaning and consistency of similarity measures (e.g., cosine similarity and Euclidean distance) across common NLP methods (e.g., n -grams and document embeddings) is unclear. This risks misalignment between theoretical constructs and textual measures, undermining the comparability of findings across studies. To address this gap, we review studies using textual similarity in organizational and psychological research, finding a jingle-jangle fallacy: identical labels are used for similarity estimates from different NLP methods, and different labels are used for the same method. Additionally, we examine the consistency of similarity measures across and within NLP methods. Different transformer-based embeddings’ similarity results are interchangeable. However, n -grams yield distinct, inconsistent results and are less appropriate for estimating similarity with distance measures. When applied to multi-word inputs, dictionaries and word embeddings return similar results reflecting linguistic style. We provide best practice recommendations and example code for operationalizing textual similarity, including clarifying which NLP methods correspond to content similarity, linguistic style similarity, and semantic similarity at the word, sentence, and document-levels of analysis.

组织研究自然语言处理文本相似性研究方法