Reidentification Risk in Panel Data: Protecting fork-Anonymity
研究发现,在15种快消品类别中,17%至94%的市场研究小组成员面临通过购买历史被再识别的高风险,并提出一种新的隐私保护方法,在保留数据有用性的同时降低风险。
Market research companies collect extensive data on purchasing, travel, and app and media usage behaviors of consumers, prescriptions written by physicians, and so forth. Although the companies provide assurances of anonymity to the study participants, there is a significant concern about the vulnerability of these data. Could a motivated intruder match the pattern of purchases with the name and other personal and potentially sensitive details of an individual? We find that 17% to 94% of market research panelists in 15 frequently bought consumer goods categories are subject to high risk of reidentification through a potential record linkage attack based on their unique purchasing histories even when their identities are anonymized. We also demonstrate that the risk of reidentification in such data are vastly understated by the conventional measure, unicity, and propose a new measure, termed “sno-unicity.” To protect the privacy of panelists, we consider the well-known privacy notion of k-anonymity and develop a new approach called “graph-based minimum movement k-anonymization” that is designed especially for retaining the usefulness of panel data. We show that our approach works well in protecting participants’ privacy without substantially altering the information that marketers need for sound marketing decisions.