A Flexible Framework for Synthesizing Categorical Sequences with Application to Human Activity Patterns
提出配对马尔可夫链方法,用于合成智能手机和传感器记录的多日连续分类数据,在人类活动和身体活动强度数据上优于现有方法。
The ability to synthesize realistic data in a parametrizable way is valuable for a number of reasons, including privacy, missing data imputation, and evaluating the performance of statistical and computational methods. When the underlying data generating process is complex, data synthesis requires approaches that balance realism and simplicity. In this paper, we address the problem of synthesizing sequential categorical data of the type that is increasingly available from mobile applications and sensors that record participant status continuously over the course of multiple days and weeks. We propose the paired Markov Chain (paired-MC) method, a flexible framework that produces sequences that closely mimic real data while providing a straightforward mechanism for modifying characteristics of the synthesized sequences. We demonstrate the paired-MC method on two datasets, one reflecting daily human activity (time use) patterns collected via a smartphone application, and one encoding the intensities of physical activity measured by wearable accelerometers. In both settings, sequences synthesized by paired-MC better capture key characteristics of the real data than alternative approaches. Supplemental materials for this article are available online.