Quantifying the Academic Quality of Children’s Videos Using Machine Comprehension
研究利用机器理解模型,通过儿童教科书问题自动评估YouTube Kids视频的学术质量,并据此对频道排序,帮助发现优质教育内容。
YouTube Kids (YTK) is one of the most popular kids’ applications used by millions of kids daily. However, various studies have highlighted concerns such as the over-presence of entertaining and commercial content in the videos on the platform. At the same time, such video-hosting platforms contain many high-quality videos that, if appropriately ranked, could allow access to quality educational videos. However, finding and ranking videos based on their educational potential is a nontrivial task. To find high-quality educational videos, this research focuses on content that is taught in schools and proposes a way to rank children’s videos using the answers to questions in children’s textbooks. Using a new data set of questions and answers from 1,000 children’s videos, we first show that machine comprehension (MC) models can be used to automate finding answers to textbook questions based on video content. We then use another large data set of school textbook questions and an augmented MC model that uses both language and visual information to rank the top children’s channels on YTK with 48,956 videos. Based on the number of children’s textbook questions that the MC model can correctly answer using these videos, we quantify the academic quality of these channels. The analysis allows us to compare channels based on their academic content and enables us to find topics that are underrepresented in the existing videos. Our research thus provides an automated way to retrieve and rank quality educational content on large video-hosting platforms that are useful for academic learning. History: This paper has been accepted by Kaushik Dutta for the Special Issue on the Responsible AI and Data Science for Social Good. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2023.0502 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2023.0502 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .