Data Shuffling—A New Masking Approach for Numerical Data
提出一种名为数据洗牌的新方法,通过打乱数值数据中的变量值来保护机密性,同时保持数据可用性,适用于小规模和大规模数据集。
This study discusses a new procedure for masking confidential numerical data—a procedure called data shuffling—in which the values of the confidential variables are “shuffled” among observations. The shuffled data provides a high level of data utility and minimizes the risk of disclosure. From a practical perspective, data shuffling overcomes reservations about using perturbed or modified confidential data because it retains all the desirable properties of perturbation methods and performs better than other masking techniques in both data utility and disclosure risk. In addition, data shuffling can be implemented using only rank-order data, and thus provides a nonparametric method for masking. We illustrate the applicability of data shuffling for small and large data sets.