Economies of scope in data aggregation: Evidence from health data
通过合并健康与社会经济变量预测健康结果,发现增加变量数量能提升预测质量,且变量互补性越强,范围经济效应越大,但回报先增后减,这有助于理解数据聚合如何形成进入壁垒,为数据共享政策提供参考。
Economies of scope in data aggregation (ESDA) are generated by the combination of complementary datasets involving the same observations. We estimate ESDA by progressively and randomly adding health and socioeconomic variables (predictors) to the machine-learning models we use to predict health outcomes. We find a positive effect of the number of variables on prediction quality, while holding the number of observations constant. We observe a positive relationship between variable complementarity and ESDA. ESDA show signs of increasing returns followed by decreasing returns. We further observe a long tail of highly contributing predictors in our data. These findings indicate that the nature of returns to scope in data aggregation may depend on the distribution of the predictors' information content. This underscores the importance of variable characteristics in determining ESDA's potential to create data barriers to entry. These results can help policymakers in designing data sharing initiatives such as the European Union's Common European Data Spaces. • Economies of scope in data aggregation (ESDA) exist if adding variables increases the quality of a prediction. • We corroborate their presence by merging health and socio-economic data to predict health outcomes. • We observe a positive relation between variable complementarity and ESDA. • ESDA do not exhibit decreasing returns. • The nature of returns to ESDA seems to depend on the distribution of the information content of the predictors.