De-identification

While a person can usually be readily identified from a picture taken directly of them, the task of identifying them on the basis of limited data is harder, yet sometimes possible.

De-identification is the process used to prevent someone's personal identity from being revealed. For example, data produced during human subject research might be de-identified to preserve the privacy of research participants. Biological data may be de-identified in order to comply with HIPAA regulations that define and stipulate patient privacy laws. [1]

When applied to metadata or general data about identification, the process is also known as data anonymization. Common strategies include deleting or masking personal identifiers, such as personal name, and suppressing or generalizing quasi-identifiers, such as date of birth. The reverse process of using de-identified data to identify individuals is known as data re-identification. Successful re-identifications[2][3][4][5] cast doubt on de-identification's effectiveness. A systematic review of fourteen distinct re-identification attacks found "a high re-identification rate […] dominated by small-scale studies on data that was not de-identified according to existing standards".[6]

De-identification is adopted as one of the main approaches toward data privacy protection.[7] It is commonly used in fields of communications, multimedia, biometrics, big data, cloud computing, data mining, internet, social networks, and audio–video surveillance.[8]

  1. ^ Rights (OCR), Office for Civil (2012-09-07). "Methods for De-identification of PHI". HHS.gov. Retrieved 2020-11-08.
  2. ^ Sweeney, L. (2000). "Simple Demographics Often Identify People Uniquely". Data Privacy Working Paper. 3.
  3. ^ de Montjoye, Y.-A. (2013). "Unique in the crowd: The privacy bounds of human mobility". Scientific Reports. 3: 1376. Bibcode:2013NatSR...3E1376D. doi:10.1038/srep01376. PMC 3607247. PMID 23524645.
  4. ^ de Montjoye, Y.-A.; Radaelli, L.; Singh, V. K.; Pentland, A. S. (29 January 2015). "Unique in the shopping mall: On the reidentifiability of credit card metadata". Science. 347 (6221): 536–539. Bibcode:2015Sci...347..536D. doi:10.1126/science.1256297. hdl:1721.1/96321. PMID 25635097.
  5. ^ Narayanan, A. (2006). "How to break anonymity of the netflix prize dataset". arXiv:cs/0610105.
  6. ^ El Emam, Khaled (2011). "A Systematic Review of Re-Identification Attacks on Health Data". PLOS ONE. 10 (4): e28071. Bibcode:2011PLoSO...628071E. doi:10.1371/journal.pone.0028071. PMC 3229505. PMID 22164229.
  7. ^ Simson., Garfinkel. De-identification of personal information : recommendation for transitioning the use of cryptographic algorithms and key lengths. OCLC 933741839.
  8. ^ Ribaric, Slobodan; Ariyaeeinia, Aladdin; Pavesic, Nikola (September 2016). "De-identification for privacy protection in multimedia content: A survey". Signal Processing: Image Communication. 47: 131–151. doi:10.1016/j.image.2016.05.020. hdl:2299/19652.