Bayesian and frequentist statistical models to predict publishing output and article processing charge totals
本文提出贝叶斯和频率统计模型,利用过去一年的同行评审文章数量预测未来文章数和总成本,为学术图书馆、机构和出版商评估出版协议提供更准确的预测。
Abstract Academic libraries, institutions, and publishers are interested in predicting future publishing output to help evaluate publishing agreements. Current predictive models are overly simplistic and provide inaccurate predictions. This paper presents Bayesian and frequentist statistical models to predict future article counts and costs. These models use the past year's counts of corresponding authored peer‐reviewed articles to predict the distribution of the number of articles in a future year. Article counts for each journal and year are modeled as a log‐linear function of year with journal‐specific coefficients. Journal‐specific predictions are summed to predict the distribution of total paper count and combined with journal‐specific costs to predict the distribution of total cost. We fit models to three data sets: 366 Wiley journals for 2016–2020, 376 Springer‐Nature journals from 2017 to 2021, and 313 Wiley journals from 2017 to 2021. For each dataset, we compared predictions for the subsequent year to actual counts. The model predicts two datasets better than using either the annual mean count or a linear trend regression. For the third, no method predicts output well. A Bayesian model provides prediction uncertainties that account for all modeled sources of uncertainty. Better estimates of future publishing activity and costs provide critical, independent information for open publishing negotiations.