Machine learning approach to synthetic data generation: Uncertainty generative model with neural attention
提出UGMNA模型,结合注意神经过程、Heston随机波动模型和随机微分方程,在连续时间潜在框架中生成合成数据,显式建模偶然和认知不确定性,提升数据稀缺环境下的预测精度和模型校准。
Abstract Data scarcity undermines the precision of empirical and analytical research by limiting sample sizes and reducing statistical power. In domains such as business operations, financial management, and information systems, failure data often arise from rare events, introducing substantial aleatoric and epistemic uncertainty. Existing synthetic data generation methods, including interpolation‐based oversampling and generative models, face persistent challenges. They often fail to capture rare events, preserve temporal dependencies, or model multiple sources of uncertainty, leading to unrealistic samples and degraded performance in downstream tasks. This study introduces the uncertainty generative model with neural attention (UGMNA), a synthetic data generation approach that integrates attentive neural processes, the Heston stochastic volatility model, and stochastic differential equations within a continuous‐time latent framework. UGMNA addresses data scarcity by generating synthetic samples that emulate the distributional characteristics of original datasets while explicitly modeling both aleatoric and epistemic uncertainty. Its design enhances statistical power by augmenting limited datasets and ensures that synthetic data reflect key patterns, temporal dynamics, and complex distributions encountered in real‐world scenarios. Experimental results across multiple case studies demonstrate that UGMNA reduces both types of uncertainty while preserving essential data patterns. Compared with conventional baselines and state‐of‐the‐art generators, UGMNA consistently improves predictive accuracy, ranking performance, and model calibration in data‐scarce, high‐variance environments. These findings establish UGMNA as a robust framework for generating reliable synthetic data, offering practical utility for research and decision‐making in contexts where data scarcity and uncertainty hinder model development.