How small is big enough? Open labeled datasets and the development of deep learning
研究了开放标注数据集(如CIFAR-10)在深度学习技术科学领域中的关键作用,通过定性和定量分析揭示其规模、实例数和类别数对技术进步和早期文献引用的影响。
Abstract We investigate the emergence of Deep Learning as a technoscientific field, emphasizing the role of open labeled datasets. Through qualitative and quantitative analyses, we evaluate the role of datasets like Canadian Institute of Advanced Research - 10 classes (CIFAR-10) in advancing computer vision and object recognition, which are central to the Deep Learning revolution. Our findings highlight CIFAR-10’s crucial role and enduring influence on the field, as well as its importance in teaching ML techniques. Results also indicate that dataset characteristics such as size, number of instances, and number of categories, were key factors. Econometric analysis confirms that CIFAR-10, a small-but-sufficiently large open dataset, played a significant and lasting role in technological advancements and had a major function in the development of the early scientific literature as shown by citation metrics.