Simulation-Based, Finite-Sample Inference for Privatized Data
针对差分隐私等机制引入噪声导致抽样分布复杂的问题,提出一种基于模拟的“repro sample”方法,能生成统计有效的置信区间和假设检验,并改进覆盖率和第一类错误。
Privacy protection methods, such as differentially private mechanisms, introduce noise into resulting statistics which often produces complex and intractable sampling distributions. In this article, we propose a simulation-based “repro sample” approach to produce statistically valid confidence intervals and hypothesis tests, which builds on the work of Xie and Wang. We show that this methodology is applicable to a wide variety of private inference problems, appropriately accounts for biases introduced by privacy mechanisms (such as by clamping), and improves over other state-of-the-art inference methods such as the parametric bootstrap in terms of the coverage and Type I error of the private inference. We also develop significant improvements and extensions for the repro sample methodology for general models (not necessarily related to privacy), including (a) modifying the procedure to ensure guaranteed coverage and Type I errors, even accounting for Monte Carlo error, and (b) proposing efficient numerical algorithms to implement the confidence intervals and p-values. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.