From Large Scale Text to Model Development: A Novel Approach to Algorithm Integration in Information Systems Research
提出SWG-AMD方法,结合半监督主题建模和溯因推理,利用大规模文本开发研究模型,并设计SW-Gen算法优化种子词生成,通过COVIDSafe数据集验证其有效性。
We introduce “Seed Word Generation–Abductive Model Development” (SWG-AMD), a method that leverages advancements in semi-supervised topic modelling and abductive reasoning for using large-scale text in research model development. As the efficacy of SWG-AMD depends on quality seed words for seeded topic modelling, we present a novel Seed Word Generation (SW-Gen) algorithm to optimise this critical input. As an illustrative application, we apply SWG-AMD to Australia’s COVIDSafe dataset to build a research model, followed by external validation using two additional datasets. We conclude by discussing SWG-AMD’s utility for researchers seeking to develop context-sensitive research models in high-volume, real-time data settings.