Multidimensional scholarly citations: Characterizing and understanding scholars' citation behaviors
研究将学术引用视为多维特征,基于PubMed三千万篇文章构建15个可解释特征,用随机森林分类实验发现引用邻近性对学者引用决策影响最大,但顶尖学者(h指数前1%)的引用决策更受引用惯性影响,强度是普通学者的近三倍。
Abstract This study investigates scholars' citation behaviors from a fine‐grained perspective. Specifically, each scholarly citation is considered multidimensional rather than logically unidimensional (i.e., present or absent). Thirty million articles from PubMed were accessed for use in empirical research, in which a total of 15 interpretable features of scholarly citations were constructed and grouped into three main categories. Each category corresponds to one aspect of the reasons and motivations behind scholars' citation decision‐making during academic writing. Using about 500,000 pairs of actual and randomly generated scholarly citations, a series of Random Forest‐based classification experiments were conducted to quantitatively evaluate the correlation between each constructed citation feature and citation decisions made by scholars. Our experimental results indicate that citation proximity is the category most relevant to scholars' citation decision‐making, followed by citation authority and citation inertia. However, big‐name scholars whose h ‐indexes rank among the top 1% exhibit a unique pattern of citation behaviors—their citation decision‐making correlates most closely with citation inertia, with the correlation nearly three times as strong as that of their ordinary counterparts. Hopefully, the empirical findings presented in this paper can bring us closer to characterizing and understanding the complex process of generating scholarly citations in academia.