Genre analysis of movies using a topic model of plot summaries
该研究通过无监督主题模型分析大量电影情节摘要,探讨了电影类型的可识别性、构成、典型性及随时间的变化,发现许多类型可通过词汇特征预测,且西部片、科幻片等类型构成随时间显著变化。
Abstract Genre plays an important role in the description, navigation, and discovery of movies, but it is rarely studied at large scale using quantitative methods. This allows an analysis of how genre labels are applied, how genres are composed and how these ingredients change, and how genres compare. We apply unsupervised topic modeling to a large collection of textual movie summaries and then use the model's topic proportions to investigate key questions in genre, including recognizability, mapping, canonicity, and change over time. We find that many genres can be quite easily predicted by their lexical signatures and this defines their position on the genre landscape. We find significant genre composition changes between periods for westerns, science fiction and road movies, reflecting changes in production and consumption values. We show that in terms of canonicity, canonical examples are often at the high end of the topic distribution profile for the genre rather than central as might be predicted by categorization theory.