An Enhanced Data Perturbation Approach for Small Data Sets
针对现有扰动方法不适用于小数据集的问题,改进了通用加性数据扰动技术,使扰动后数据在降低泄露风险的同时,保证常用统计分析结果与原数据一致。
ABSTRACT As modern organizations gather, analyze, and share large quantities of data, issues of privacy, and confidentiality are becoming increasingly important. Perturbation methods are used to protect confidentiality when confidential, numerical data are shared or disseminated for analysis. Unfortunately, existing perturbation methods are not suitable for protecting small data sets. With small data sets, existing perturbation methods result in reduced protection against disclosure risk due to sampling error. Sampling error may also produce different results from the analysis of perturbed data compared to the original data, reducing data utility. In this study, we develop an enhancement of an existing perturbation technique, General Additive Data Perturbation, that can be used to effectively mask both large and small data sets. The proposed enhancement minimizes the risk of disclosure while ensuring that the results of commonly performed statistical analyses are identical and equal for both the original and the perturbed data.