数据洗牌:一种新的数值数据掩蔽方法

Data Shuffling—A New Masking Approach for Numerical Data

Management Science · 2006
被引 136
人大 A+FT50UTD24ABS 4*

中文导读

提出一种名为数据洗牌的新方法,通过打乱数值数据中的变量值来保护机密性,同时保持数据可用性,适用于小规模和大规模数据集。

Abstract

This study discusses a new procedure for masking confidential numerical data—a procedure called data shuffling—in which the values of the confidential variables are “shuffled” among observations. The shuffled data provides a high level of data utility and minimizes the risk of disclosure. From a practical perspective, data shuffling overcomes reservations about using perturbed or modified confidential data because it retains all the desirable properties of perturbation methods and performs better than other masking techniques in both data utility and disclosure risk. In addition, data shuffling can be implemented using only rank-order data, and thus provides a nonparametric method for masking. We illustrate the applicability of data shuffling for small and large data sets.

数据混洗数值数据掩码披露风险数据效用